SCAN-O-MATIC PHENOMICS

We screened a CRISPR interference library consisting of >9000 Saccharomyces cerevisiae strains where >98% of all essential and respiratory growth-essential genes were targeted with multiple gRNAs. The screen was performed using the high-throughput, high-resolution scan-o-matic platform (Zackrisson et al., 2016) link, where each strain is analyzed separately in order to generate and analyze high-resolution growth curves without the influence/competition from other strains.

ACETIC ACID TITRATION

In an ideal library screening (with strains coming from the same background) we should be able to observe a normally distributed wide phenotypic variability under a particular test condition to pick the best and the worst performers in the library. For this purpose, we need to identify a stressor concentration (in this case acetic acid), which should be severe enough to induce large phenotypic variability but at the same time most strains should manage to grow and give us a quantitative phenotype. Therefore, Plate 7 & 8 was pre-screened at different acetic acid concentrations (0, 50mM, 75mM, 100mM, and 150mM of acetic acid) to identify appropriate acetic acid concentration for the whole library screening. Unfortunately, the spatial control strain at this point was BY4741, which did not growth at 150mM (BY4741 was later replaced with CC23 i.e. one of the CRISPRi control strain with a gRNA non-homologous to Saccharomyces cerevisiae genome). Therefore, to compare our results we used the absolute generation time (without any normalization for spatial bias). We assumed that the phenotypic variability due to spatial bias will be very similar within the test plates and since we will only look at the phenotypic variability within the strains at this point, it should not severely influence the final conclusion of this titration round. The data is available in the COMPILED_DATA folder.

Acetic acid titration data : 20210120_AA_titration_absolute_compiled.csv

  • Import the data
AA_titration_data <- read.csv("COMPILED_DATA/20210120_AA_titration_absolute_compiled.csv", na.strings = "NoGrowth")
  • Install packages: Out of these ggplot2 and reshape will be frequently used later for data visualization

  • ggplot2

  • reshape)

  • ggridges

  • Prepare the data in the format requisite for ggplot2 package using reshape

AA_titration_data_reshape <- reshape(data=AA_titration_data, idvar="gRNA_name",
                                     varying = colnames(AA_titration_data)[3:7],
                                     v.name=c("Generation_time"),
                                     new.row.names = 1:30000,
                                     direction="long",
                                     timevar = "Condition",
                                     times = colnames(AA_titration_data)[3:7])
  • Plot the Ridgeline plots: A nice way to compare the density trace of multiple dataset
Figure 1: Density trace of absolute generation time of strains in plate 7 and 8 at different concentration of acetic acid

Figure 1: Density trace of absolute generation time of strains in plate 7 and 8 at different concentration of acetic acid

CONCLUSION OF ACETIC ACID TITRATION

At 150mM we observed the largest phenotypic variability within the strains of plate 7 and 8. Therefore, 150mM was the selected acetic acid concentration to screen the entire library.

IMPORT SCAN-O-MATIC RAW DATA

The phenotypic data generated in scan-o-matic screening in .csv format. We extract both the absolute and the normalized phenotypes.

The CRISPRi strains in the library were arrayed in 24 plates in 384 format. Each CRISPRi plate was subjected to two different condition (Basal and 150 mM of Acetic acid). Therefore, for each plate four different files are generated. All files generated in a single independent experimental round are stored in a single folder.

  • SOM_SCR_R001 : Raw data for round1

  • SOM_SCR_R002 : Raw data for round2

ABSOLUTE DATA

The Absolute dataset gives the extracted phenotypes without any spatial normalization

NORMALIZED DATA

The Normalized dataset is generated after removal of any spatial bias. This is in log2 scale and referred as Log Strain Coefficient (LSC) values

FILE NAMING

Each file is named with the plate identifier in such a way so that it can be easily called programmatically

Eg. Plate 1 absolute data in basal (Ctrl) condition have the following string
Ctrl1.phenotypes.Absolute
AND
Plate 1 Normalized data in acetic acid (aa) stress have the string
aa1.phenotypes.Normalized

PURPOSE 1

At the end of this data import session, a single data.frame will be generated with the data of 24 plates. The whole dataset will be labeled with the strains attributes using the metadata key file (provided in the COMPILED_DATA folder). The data import below is shown for only Round2 dataset. Round1 can be generated modifying the folder location

METADATA KEY FILE : library_keyfile1536.csv

IMPORTING THE METADATA FILE

Metadata_CRISPRi <- read.csv("COMPILED_DATA/library_keyfile1536.csv", na.strings = "#N/A", stringsAsFactors = FALSE)
str(Metadata_CRISPRi)
## 'data.frame':    36864 obs. of  11 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...

GENERATE BASAL ABSOLUTE DATASET

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_Ctrl_Abs <- data.frame()
for(i in 1:24){
  m <- paste0("Ctrl", i, ".phenotypes.Absolute") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_Ctrl_Abs <- rbind(data_Ctrl_Abs, temp_df)
}
str(data_Ctrl_Abs)
## 'data.frame':    36864 obs. of  18 variables:
##  $ Plate                                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Row                                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Column                                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Phenotypes.InitialValue                 : num  80116 83495 92467 97104 99622 ...
##  $ Phenotypes.ExperimentBaseLine           : num  81419 84504 94039 98641 101354 ...
##  $ Phenotypes.ExperimentEndAverage         : num  8615209 6710686 6037319 5554230 5234758 ...
##  $ Phenotypes.ColonySize48h                : num  6487316 5329111 4946774 4684166 4440201 ...
##  $ Phenotypes.ChapmanRichardsParam2        : num  14.5 12.8 42.7 17.6 18.3 ...
##  $ Phenotypes.ChapmanRichardsParam3        : num  -2.74 -2.65 -2.58 -2.52 -2.47 ...
##  $ Phenotypes.ChapmanRichardsParamXtra     : num  16 16 16.2 16.3 16.3 ...
##  $ Phenotypes.ChapmanRichardsParam1        : num  1.95 1.89 1.84 1.8 1.77 ...
##  $ Phenotypes.ChapmanRichardsParam4        : num  -31.33 -31.47 -3.93 -2.78 -2.48 ...
##  $ Phenotypes.GenerationTimeStErrOfEstimate: num  0.012879 0.002174 0.000633 0.001667 0.000759 ...
##  $ Phenotypes.ExperimentGrowthYield        : num  8533790 6626182 5943280 5455589 5133404 ...
##  $ Phenotypes.GenerationTime               : num  2.53 2.48 2.51 2.56 2.53 ...
##  $ Phenotypes.ExperimentPopulationDoublings: num  6.73 6.31 6 5.82 5.69 ...
##  $ Phenotypes.ChapmanRichardsFit           : num  0.999 0.999 0.999 0.999 0.999 ...
##  $ Phenotypes.GenerationTimeWhen           : num  6.14 4.1 4.1 3.76 3.76 ...

Several phenotypes are extracted. However, the most useful for this study will be,

  • Column No: 14 i.e. Phenotypes.ExperimentGrowthYield
  • Column No: 15 i.e. Phenotypes.GenerationTime

Extract only this two column in the final data.frame
Rename the column names to prevent any ambiguity

data_Ctrl_Abs_Trim <- data_Ctrl_Abs[, 14:15]
colnames(data_Ctrl_Abs_Trim) <- c("CTRL_Y_ABS", "CTRL_GT_ABS")
str(data_Ctrl_Abs_Trim)
## 'data.frame':    36864 obs. of  2 variables:
##  $ CTRL_Y_ABS : num  8533790 6626182 5943280 5455589 5133404 ...
##  $ CTRL_GT_ABS: num  2.53 2.48 2.51 2.56 2.53 ...

GENERATE ACETIC ACID ABSOLUTE DATASET

Following the same strategy as above

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_AA_Abs <- data.frame()
for(i in 1:24){
  m <- paste0("aa", i, ".phenotypes.Absolute") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_AA_Abs <- rbind(data_AA_Abs, temp_df)
}
data_AA_Abs_Trim <- data_AA_Abs[, 14:15]
colnames(data_AA_Abs_Trim) <- c("AA_Y_ABS", "AA_GT_ABS")

GENERATE BASAL NORMALIZED DATASET

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_Ctrl_Norm <- data.frame()
for(i in 1:24){
  m <- paste0("Ctrl", i, ".phenotypes.Normalized") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_Ctrl_Norm <- rbind(data_Ctrl_Norm, temp_df)
}
str(data_Ctrl_Norm)
## 'data.frame':    36864 obs. of  8 variables:
##  $ Plate                                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Row                                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Column                                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Phenotypes.ExperimentGrowthYield        : num  0.755 0.39 0.233 0.484 0.396 ...
##  $ Phenotypes.GenerationTime               : num  0.0505 0.0204 0.0389 -0.0173 -0.0327 ...
##  $ Phenotypes.ExperimentPopulationDoublings: num  0.1907 0.0991 0.0272 0.113 0.0818 ...
##  $ Phenotypes.ExperimentBaseLine           : num  -0.0884 -0.0347 0.1195 0.0367 0.0758 ...
##  $ Phenotypes.ColonySize48h                : num  0.584 0.301 0.193 0.397 0.32 ...

The most useful for this study will be,

  • Column No: 4 i.e. Phenotypes.ExperimentGrowthYield
  • Column No: 5 i.e. Phenotypes.GenerationTime

Extract only this two column

data_Ctrl_Norm_Trim <- data_Ctrl_Norm[, 4:5]
colnames(data_Ctrl_Norm_Trim) <- c("CTRL_Y_NORM", "CTRL_GT_NORM")

GENERATE ACETIC ACID NORMALIZED DATASET

Same as above

m <- vector(mode = "character", length = 0)
file.names<-vector(mode = "character", length = 0)
temp_df<-data.frame()
data_AA_Norm <- data.frame()
for(i in 1:24){
  m <- paste0("aa", i, ".phenotypes.Normalized") 
  file.names[i] <- dir("RAW_DATA/SOM_SCR_R002/", pattern = m, full.names = TRUE)
  temp_df <- read.csv(file.names[i], na.strings = "NoGrowth")
  data_AA_Norm <- rbind(data_AA_Norm, temp_df)
}
data_AA_Norm_Trim <- data_AA_Norm[, 4:5]
colnames(data_AA_Norm_Trim) <- c("AA_Y_NORM", "AA_GT_NORM")

COMBINE THE DATASETS TO OBTAIN FINAL DATAFRAME

Trimmed datasets are combined to obtain the final data.frame. The combined data frame is labeled as data from ROUND2

R <- rep("2nd_round", 36864)
Round_ID <- data.frame(R, stringsAsFactors = FALSE)
whole_data_R2 <- cbind(Metadata_CRISPRi, 
                       Round_ID, 
                       data_Ctrl_Abs_Trim, 
                       data_AA_Abs_Trim, 
                       data_Ctrl_Norm_Trim, 
                       data_AA_Norm_Trim)
colnames(whole_data_R2)[12] <- "Round_ID"
str(whole_data_R2)
## 'data.frame':    36864 obs. of  20 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "2nd_round" "2nd_round" "2nd_round" "2nd_round" ...
##  $ CTRL_Y_ABS        : num  8533790 6626182 5943280 5455589 5133404 ...
##  $ CTRL_GT_ABS       : num  2.53 2.48 2.51 2.56 2.53 ...
##  $ AA_Y_ABS          : num  2090439 2241914 1861277 1920070 1957912 ...
##  $ AA_GT_ABS         : num  8.92 8.43 9.19 8.91 8.87 ...
##  $ CTRL_Y_NORM       : num  0.755 0.39 0.233 0.484 0.396 ...
##  $ CTRL_GT_NORM      : num  0.0505 0.0204 0.0389 -0.0173 -0.0327 ...
##  $ AA_Y_NORM         : num  0.373 0.474 0.205 0.386 0.415 ...
##  $ AA_GT_NORM        : num  0.019 -0.0614 0.0621 -0.2135 -0.219 ...

IMPORT RESULTS FROM ROUND1

The results from Round1 is already compiled to a .csv file in COMPILED_DATA folder Results 1st Round : 20190903_CRISPRi_Screen_aa_1st_round.csv

Import the dataset and label as data from ROUND1

First_round <- read.csv("COMPILED_DATA/20190903_CRISPRi_Screen_aa_1st_round.csv", 
                        na.strings = c("#N/A", "NoGrowth"), 
                        stringsAsFactors = FALSE)
R <- rep("1st_round", 36864)
Round_ID <- data.frame(R, stringsAsFactors = FALSE)
whole_data_R1 <- cbind(Metadata_CRISPRi, Round_ID, First_round[, 12:19])
colnames(whole_data_R1)[12] <- "Round_ID"
str(whole_data_R1)
## 'data.frame':    36864 obs. of  20 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "1st_round" "1st_round" "1st_round" "1st_round" ...
##  $ CTRL_Y_ABS        : num  7046465 5380541 4841666 4517281 4209174 ...
##  $ CTRL_GT_ABS       : num  3.15 3.64 3.18 3.04 3.13 ...
##  $ AA_Y_ABS          : num  833657 844666 708042 734725 717147 ...
##  $ AA_GT_ABS         : num  13.1 13.6 14.5 14.7 13.6 ...
##  $ CTRL_Y_NORM       : num  0.899 0.51 0.358 0.565 0.463 ...
##  $ CTRL_GT_NORM      : num  -0.10076 0.10855 -0.08699 0.00504 0.0462 ...
##  $ AA_Y_NORM         : num  -0.362 -0.343 -0.598 -0.642 -0.677 ...
##  $ AA_GT_NORM        : num  0.273 0.331 0.423 0.425 0.313 ...

COMBINE THE DATASETS of ROUND 1 AND 2

whole_data_CRISPRi_aa <- rbind(whole_data_R1, whole_data_R2)

SCAN-O-MATIC PHENOMICS ANALYSIS

In this study most of the downstream analysis was performed using the phenotype Generation_time(GT)

PURPOSE 2

In this session, downstream data processing and statistical analysis of SCAN-O-MATIC raw output will be performed

ESTIMATE THE LOG PHENOTYPIC INDEX (LPI) VALUES

LPI of strain is the difference of its normalized Generation_Time(GT) / Yield(Y) (LSC, see IMPORT SCAN-O-MATIC RAW DATA) on acetic acid stress plate to the basal condition. It gives a RELATIVE estimate of how a strain performed under acetic acid stress relative to the basal condition.

The RELATIVE GENERATION TIME i.e. LPI_GT = LSC_GT_Acetic_Acid - LSC_GT_Basal

whole_data_CRISPRi_aa[, 21] <- whole_data_CRISPRi_aa[, 19]-whole_data_CRISPRi_aa[, 17]
whole_data_CRISPRi_aa[, 22] <- whole_data_CRISPRi_aa[, 20]-whole_data_CRISPRi_aa[, 18]
colnames(whole_data_CRISPRi_aa)[21] <- "LPI_Y"
colnames(whole_data_CRISPRi_aa)[22] <- "LPI_GT"

PERFORM PLATE-WISE BATCH CORRECTION

Plate-wise batch correction was conducted by subtracting the median of LSC GT values of all the individual colonies on a plate from the individual LSC GT values of the colonies growing on that plate.

i.e. if strainX is growing in Basal condition on plate Z, the corrected LSC_GT value for strainX in the Basal condition is the following;

  • LSC_GT_Basal_CorrectedstrainX = (LSC_GT_BasalstrainX) - Median(LSC_GT BasalPlateZ)
plate_ID <- as.character(unique(whole_data_CRISPRi_aa$SOURCEPLATEID))
whole_data_CRISPRi_aa_corrected <- whole_data_CRISPRi_aa
med_LogLSCctrl_RND1_GT <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND1_GT <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND2_GT <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND2_GT <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND1_Y <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND1_Y <- vector(mode = "integer", length = 0)
med_LogLSCctrl_RND2_Y <- vector(mode = "integer", length = 0)
med_LogLSCaa_RND2_Y <- vector(mode = "integer", length = 0)

for(i in 1:24){
med_LogLSCctrl_RND1_GT[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                       & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM) 
                                                                                         & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCaa_RND1_GT[i] <- median(whole_data_CRISPRi_aa_corrected$AA_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                     & !is.na(whole_data_CRISPRi_aa_corrected$AA_GT_NORM) 
                                                                                     & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCctrl_RND2_GT[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                         & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_GT_NORM) 
                                                                                         & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])

med_LogLSCaa_RND2_GT[i] <- median(whole_data_CRISPRi_aa_corrected$AA_GT_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                     & !is.na(whole_data_CRISPRi_aa_corrected$AA_GT_NORM) 
                                                                                     & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])

med_LogLSCctrl_RND1_Y[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                       & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM) 
                                                                                       & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCaa_RND1_Y[i] <- median(whole_data_CRISPRi_aa_corrected$AA_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                   & !is.na(whole_data_CRISPRi_aa_corrected$AA_Y_NORM) 
                                                                                   & whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round")])

med_LogLSCctrl_RND2_Y[i] <- median(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                       & !is.na(whole_data_CRISPRi_aa_corrected$CTRL_Y_NORM) 
                                                                                       & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])

med_LogLSCaa_RND2_Y[i] <- median(whole_data_CRISPRi_aa_corrected$AA_Y_NORM[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i]
                                                                                   & !is.na(whole_data_CRISPRi_aa_corrected$AA_Y_NORM) 
                                                                                   & whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round")])
  
whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 23] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 17] - med_LogLSCctrl_RND1_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 24] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 18] - med_LogLSCctrl_RND1_GT[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 23] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 17] - med_LogLSCctrl_RND2_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 24] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 18] - med_LogLSCctrl_RND2_GT[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 25] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 19] - med_LogLSCaa_RND1_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round") , 26] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="1st_round"), 20] - med_LogLSCaa_RND1_GT[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 25] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 19] - med_LogLSCaa_RND2_Y[i]
  whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                          whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round") , 26] <- whole_data_CRISPRi_aa_corrected[which(whole_data_CRISPRi_aa_corrected$SOURCEPLATEID==plate_ID[i] &
                                                                                                                                                  whole_data_CRISPRi_aa_corrected$Round_ID=="2nd_round"), 20] - med_LogLSCaa_RND2_GT[i]
}

ESTIMATE THE BATCH CORRECTED LOG PHENOTYPIC INDEX (LPI) VALUES

Estimate the corrected LPI values (see ESTIMATE THE LOG PHENOTYPIC INDEX (LPI) VALUES) based on the corrected LSC values

i.e. LPI_GTcorrected = LSC_GT_Acetic_Acidcorrected - LSC_GT_Basalcorrected

Estimate the corrected LPI_Y

whole_data_CRISPRi_aa_corrected[, 27] <- whole_data_CRISPRi_aa_corrected[, 25] - whole_data_CRISPRi_aa_corrected[, 23]

Estimate the corrected LPI_GT

whole_data_CRISPRi_aa_corrected[, 28] <- whole_data_CRISPRi_aa_corrected[, 26] - whole_data_CRISPRi_aa_corrected[, 24] 

SETTING THE NAMES OF THE NEW COLUMNS

colnm <- colnames(whole_data_CRISPRi_aa)[17:22]
colnm <- paste0(colnm, "_CR")
colnames(whole_data_CRISPRi_aa_corrected)[23:28] <- colnm
str(whole_data_CRISPRi_aa_corrected)
## 'data.frame':    73728 obs. of  28 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "1st_round" "1st_round" "1st_round" "1st_round" ...
##  $ CTRL_Y_ABS        : num  7046465 5380541 4841666 4517281 4209174 ...
##  $ CTRL_GT_ABS       : num  3.15 3.64 3.18 3.04 3.13 ...
##  $ AA_Y_ABS          : num  833657 844666 708042 734725 717147 ...
##  $ AA_GT_ABS         : num  13.1 13.6 14.5 14.7 13.6 ...
##  $ CTRL_Y_NORM       : num  0.899 0.51 0.358 0.565 0.463 ...
##  $ CTRL_GT_NORM      : num  -0.10076 0.10855 -0.08699 0.00504 0.0462 ...
##  $ AA_Y_NORM         : num  -0.362 -0.343 -0.598 -0.642 -0.677 ...
##  $ AA_GT_NORM        : num  0.273 0.331 0.423 0.425 0.313 ...
##  $ LPI_Y             : num  -1.262 -0.854 -0.956 -1.207 -1.14 ...
##  $ LPI_GT            : num  0.374 0.223 0.51 0.42 0.266 ...
##  $ CTRL_Y_NORM_CR    : num  0.918 0.529 0.376 0.583 0.482 ...
##  $ CTRL_GT_NORM_CR   : num  -0.0877 0.1216 -0.074 0.0181 0.0592 ...
##  $ AA_Y_NORM_CR      : num  -0.0736 -0.0547 -0.3092 -0.3534 -0.3883 ...
##  $ AA_GT_NORM_CR     : num  0.19 0.248 0.34 0.342 0.229 ...
##  $ LPI_Y_CR          : num  -0.991 -0.583 -0.686 -0.937 -0.87 ...
##  $ LPI_GT_CR         : num  0.278 0.127 0.414 0.323 0.17 ...

EXTRACT ONLY THE BATCH CORRECTED COLUMNS

whole_data_CRISPRi_aa_2 <- whole_data_CRISPRi_aa_corrected[, c(1:16, 23:28)]
colnames(whole_data_CRISPRi_aa_2)[17:22] <- colnames(whole_data_CRISPRi_aa)[17:22]
str(whole_data_CRISPRi_aa_2)
## 'data.frame':    73728 obs. of  22 variables:
##  $ SL_No             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gRNA_name         : chr  "FBP26-TRg-1" "FBP26-TRg-1" "HMI1-NRg-1" "HMI1-NRg-1" ...
##  $ Seq               : chr  "GCTTATCATACATTTACATC" "GCTTATCATACATTTACATC" "AAAAATTCTGACACATCACA" "AAAAATTCTGACACATCACA" ...
##  $ SOURCEPLATEID     : chr  "R2877.H.001" "R2877.H.001" "R2877.H.001" "R2877.H.001" ...
##  $ SOURCEDENSITY     : chr  "384A" "384A" "384A" "384A" ...
##  $ SOURCECOLONYCOLUMN: int  1 1 2 2 3 3 4 4 5 5 ...
##  $ SOURCECOLONYROW   : chr  "A" "A" "A" "A" ...
##  $ border            : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ GENE              : chr  "FBP26" "FBP26" "HMI1" "HMI1" ...
##  $ Control.gRNA      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536     : chr  "A1" "A2" "A3" "A4" ...
##  $ Round_ID          : chr  "1st_round" "1st_round" "1st_round" "1st_round" ...
##  $ CTRL_Y_ABS        : num  7046465 5380541 4841666 4517281 4209174 ...
##  $ CTRL_GT_ABS       : num  3.15 3.64 3.18 3.04 3.13 ...
##  $ AA_Y_ABS          : num  833657 844666 708042 734725 717147 ...
##  $ AA_GT_ABS         : num  13.1 13.6 14.5 14.7 13.6 ...
##  $ CTRL_Y_NORM       : num  0.918 0.529 0.376 0.583 0.482 ...
##  $ CTRL_GT_NORM      : num  -0.0877 0.1216 -0.074 0.0181 0.0592 ...
##  $ AA_Y_NORM         : num  -0.0736 -0.0547 -0.3092 -0.3534 -0.3883 ...
##  $ AA_GT_NORM        : num  0.19 0.248 0.34 0.342 0.229 ...
##  $ LPI_Y             : num  -0.991 -0.583 -0.686 -0.937 -0.87 ...
##  $ LPI_GT            : num  0.278 0.127 0.414 0.323 0.17 ...

CONSTRUCT A NEW DATA STRUCTURE

Construct a new data structure where data from each strain (have a unique guide-RNA) is in a separate row and the replicates from first and second round are side by side. Add also the mean, median and standard deviation statistics for each phenotype

REMOVE ROWS WITH SPATIAL CONTROL STRAIN DATA

Data_CRISPRi_aa <- subset(whole_data_CRISPRi_aa_2, whole_data_CRISPRi_aa_2$gRNA_name!="SP_Ctrl_CC23")

CREATE A TABLE OF UNIQUE gRNA

df_unique_sgRNA <- data.frame(table(Data_CRISPRi_aa$gRNA_name))

ARRANGE THE DATA IN THE DESIRED FORMAT

R1<-vector(mode = "integer", length = 0)
R2<-vector(mode = "integer", length = 0)
test2<-data.frame()
n<-nrow(df_unique_sgRNA)
for(i in 1:n){
  R1 <- which(Data_CRISPRi_aa$gRNA_name==df_unique_sgRNA$Var1[i] & Data_CRISPRi_aa$Round_ID=="1st_round")
  R2 <- which(Data_CRISPRi_aa$gRNA_name==df_unique_sgRNA$Var1[i] & Data_CRISPRi_aa$Round_ID=="2nd_round")
  test1 <- Data_CRISPRi_aa[c(R1, R2), ]
  test2[i, c(1:8)]<-test1[1, c(2:4, 6:7, 9:11)]
  test2[i, c(9:14)] <- test1$CTRL_GT_NORM
  test2[i, 15] <- mean(test1$CTRL_GT_NORM[1:3])
  test2[i, 16] <- mean(test1$CTRL_GT_NORM[4:6])
  test2[i, 17] <- sd(test1$CTRL_GT_NORM[1:3])
  test2[i, 18] <- sd(test1$CTRL_GT_NORM[4:6])
  test2[i, 19] <- mean(test1$CTRL_GT_NORM[1:6])
  test2[i, 20] <- median(test1$CTRL_GT_NORM[1:6])
  test2[i, 21] <- sd(test1$CTRL_GT_NORM[1:6])
  test2[i, c(22:27)] <- test1$AA_GT_NORM
  test2[i, 28] <- mean(test1$AA_GT_NORM[1:3])
  test2[i, 29] <- mean(test1$AA_GT_NORM[4:6])
  test2[i, 30] <- sd(test1$AA_GT_NORM[1:3])
  test2[i, 31] <- sd(test1$AA_GT_NORM[4:6])
  test2[i, 32] <- mean(test1$AA_GT_NORM[1:6])
  test2[i, 33] <- median(test1$AA_GT_NORM[1:6])
  test2[i, 34] <- sd(test1$AA_GT_NORM[1:6])
  test2[i, c(35:40)] <- test1$LPI_GT
  test2[i, 41] <- mean(test1$LPI_GT[1:3])
  test2[i, 42] <- mean(test1$LPI_GT[4:6])
  test2[i, 43] <- sd(test1$LPI_GT[1:3])
  test2[i, 44] <- sd(test1$LPI_GT[4:6])
  test2[i, 45] <- mean(test1$LPI_GT[1:6])
  test2[i, 46] <- median(test1$LPI_GT[1:6])
  test2[i, 47] <- sd(test1$LPI_GT[1:6])
  test2[i, c(48:53)] <- test1$CTRL_Y_NORM
  test2[i, 54] <- mean(test1$CTRL_Y_NORM[1:3])
  test2[i, 55] <- mean(test1$CTRL_Y_NORM[4:6])
  test2[i, 56] <- sd(test1$CTRL_Y_NORM[1:3])
  test2[i, 57] <- sd(test1$CTRL_Y_NORM[4:6])
  test2[i, 58] <- mean(test1$CTRL_Y_NORM[1:6])
  test2[i, 59] <- median(test1$CTRL_Y_NORM[1:6])
  test2[i, 60] <- sd(test1$CTRL_Y_NORM[1:6])
  test2[i, c(61:66)] <- test1$AA_Y_NORM
  test2[i, 67] <- mean(test1$AA_Y_NORM[1:3])
  test2[i, 68] <- mean(test1$AA_Y_NORM[4:6])
  test2[i, 69] <- sd(test1$AA_Y_NORM[1:3])
  test2[i, 70] <- sd(test1$AA_Y_NORM[4:6])
  test2[i, 71] <- mean(test1$AA_Y_NORM[1:6])
  test2[i, 72] <- median(test1$AA_Y_NORM[1:6])
  test2[i, 73] <- sd(test1$AA_Y_NORM[1:6])
  test2[i, c(74:79)] <- test1$LPI_Y
  test2[i, 80] <- mean(test1$LPI_Y[1:3])
  test2[i, 81] <- mean(test1$LPI_Y[4:6])
  test2[i, 82] <- sd(test1$LPI_Y[1:3])
  test2[i, 83] <- sd(test1$LPI_Y[4:6])
  test2[i, 84] <- mean(test1$LPI_Y[1:6])
  test2[i, 85] <- median(test1$LPI_Y[1:6])
  test2[i, 86] <- sd(test1$LPI_Y[1:6])
}

ASSIGN COLUMN NAMES

Column names are already stored in a text times available in the COMPILED_DATA folder. Then store the data.frame under a new name.

column_names <- read.table("COMPILED_DATA/Column_names.txt", header = FALSE, sep = "\t", as.is = TRUE)
colnames(test2) <- column_names$V1
Analysis_CRISPRi_aa_Complete <- test2
str(Analysis_CRISPRi_aa_Complete)
## 'data.frame':    9078 obs. of  86 variables:
##  $ gRNA_name          : chr  "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
##  $ Seq                : chr  "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
##  $ SOURCEPLATEID      : chr  "R2877.H.023" "R2877.H.024" "R2877.H.023" "R2877.H.023" ...
##  $ SOURCECOLONYCOLUMN : int  5 21 22 20 21 18 6 7 9 8 ...
##  $ SOURCECOLONYROW    : chr  "O" "L" "P" "N" ...
##  $ GENE               : chr  "AAR2" "AAR2" "AAR2" "AAR2" ...
##  $ Control.gRNA       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536      : chr  "AC9" "W41" "AE43" "AA39" ...
##  $ CTRL_GT_RND1_R1    : num  0.00397 0.00586 0.10456 -0.01142 0.0174 ...
##  $ CTRL_GT_RND1_R2    : num  -0.0086 0.00166 0.07118 0.00256 0.04094 ...
##  $ CTRL_GT_RND1_R3    : num  -0.000625 -0.043918 -0.009614 0.007969 -0.014018 ...
##  $ CTRL_GT_RND2_R1    : num  0.0431 -0.0336 0.0292 0.0381 0.0742 ...
##  $ CTRL_GT_RND2_R2    : num  -0.0165 0.00744 0.01433 0.06599 -0.02973 ...
##  $ CTRL_GT_RND2_R3    : num  0.03266 -0.04468 -0.04455 0.01994 -0.00924 ...
##  $ CTRL_GT_RND1_MEAN  : num  -0.00175 -0.01213 0.05538 -0.0003 0.01477 ...
##  $ CTRL_GT_RND2_MEAN  : num  0.019756 -0.023599 -0.000341 0.041342 0.011747 ...
##  $ CTRL_GT_RND1_SD    : num  0.00636 0.02761 0.05871 0.01001 0.02757 ...
##  $ CTRL_GT_RND2_SD    : num  0.0318 0.0275 0.039 0.0232 0.0551 ...
##  $ CTRL_GT_RND1_2_MEAN: num  0.009 -0.0179 0.0275 0.0205 0.0133 ...
##  $ CTRL_GT_RND1_2_MED : num  0.00167 -0.01595 0.02176 0.01395 0.00408 ...
##  $ CTRL_GT_RND1_2_SD  : num  0.0237 0.0254 0.054 0.0278 0.039 ...
##  $ AA_GT_RND1_R1      : num  -0.0239 0.06 -0.0178 -0.1839 0.5993 ...
##  $ AA_GT_RND1_R2      : num  -0.0311 0.0763 -0.0145 -0.1293 0.2619 ...
##  $ AA_GT_RND1_R3      : num  0.0315 -0.0191 -0.0778 -0.0514 0.3276 ...
##  $ AA_GT_RND2_R1      : num  0.0142 -0.1375 0.111 0.0105 0.2021 ...
##  $ AA_GT_RND2_R2      : num  0.0383 -0.2266 0.0588 0.0966 0.1831 ...
##  $ AA_GT_RND2_R3      : num  0.0265 -0.2298 0.0789 0.0787 0.0114 ...
##  $ AA_GT_RND1_MEAN    : num  -0.00781 0.03905 -0.03671 -0.12151 0.39624 ...
##  $ AA_GT_RND2_MEAN    : num  0.0263 -0.198 0.0829 0.0619 0.1322 ...
##  $ AA_GT_RND1_SD      : num  0.0343 0.051 0.0356 0.0666 0.1789 ...
##  $ AA_GT_RND2_SD      : num  0.0121 0.0524 0.0263 0.0455 0.105 ...
##  $ AA_GT_RND1_2_MEAN  : num  0.00925 -0.07945 0.02308 -0.02979 0.2642 ...
##  $ AA_GT_RND1_2_MED   : num  0.0203 -0.0783 0.0221 -0.0205 0.232 ...
##  $ AA_GT_RND1_2_SD    : num  0.0296 0.1378 0.0712 0.1127 0.1953 ...
##  $ LPI_GT_RND1_R1     : num  -0.0278 0.0541 -0.1224 -0.1724 0.5819 ...
##  $ LPI_GT_RND1_R2     : num  -0.0225 0.0746 -0.0857 -0.1318 0.2209 ...
##  $ LPI_GT_RND1_R3     : num  0.0322 0.0248 -0.0682 -0.0594 0.3416 ...
##  $ LPI_GT_RND2_R1     : num  -0.0289 -0.1039 0.0818 -0.0276 0.1279 ...
##  $ LPI_GT_RND2_R2     : num  0.0548 -0.2341 0.0444 0.0306 0.2128 ...
##  $ LPI_GT_RND2_R3     : num  -0.0062 -0.1851 0.1234 0.0588 0.0206 ...
##  $ LPI_GT_RND1_MEAN   : num  -0.00605 0.05119 -0.09208 -0.12121 0.38147 ...
##  $ LPI_GT_RND2_MEAN   : num  0.00656 -0.17435 0.08321 0.02058 0.12042 ...
##  $ LPI_GT_RND1_SD     : num  0.0332 0.025 0.0276 0.0573 0.1837 ...
##  $ LPI_GT_RND2_SD     : num  0.0433 0.0657 0.0395 0.0441 0.0963 ...
##  $ LPI_GT_RND1_2_MEAN : num  0.000251 -0.061582 -0.004437 -0.050313 0.250944 ...
##  $ LPI_GT_RND1_2_MED  : num  -0.0143 -0.0396 -0.0119 -0.0435 0.2169 ...
##  $ LPI_GT_RND1_2_SD   : num  0.0352 0.1313 0.1007 0.0901 0.1941 ...
##  $ CTRL_Y_RND1_R1     : num  0.055 0.0573 0.0189 0.0779 -0.1151 ...
##  $ CTRL_Y_RND1_R2     : num  0.0131 0.0472 0.0129 0.0562 -0.0723 ...
##  $ CTRL_Y_RND1_R3     : num  0.0306 0.0109 -0.0236 0.1208 -0.0171 ...
##  $ CTRL_Y_RND2_R1     : num  0.0399 0.0113 -0.1422 0.0605 -0.0676 ...
##  $ CTRL_Y_RND2_R2     : num  0.02531 0.0066 -0.17285 -0.00976 -0.08442 ...
##  $ CTRL_Y_RND2_R3     : num  -0.0528 0.0089 0.00598 0.06854 -0.08083 ...
##  $ CTRL_Y_RND1_MEAN   : num  0.03288 0.03847 0.00276 0.08497 -0.06817 ...
##  $ CTRL_Y_RND2_MEAN   : num  0.00414 0.00893 -0.10301 0.03977 -0.07762 ...
##  $ CTRL_Y_RND1_SD     : num  0.0211 0.0244 0.023 0.0329 0.0491 ...
##  $ CTRL_Y_RND2_SD     : num  0.04985 0.00234 0.09563 0.04308 0.00885 ...
##  $ CTRL_Y_RND1_2_MEAN : num  0.0185 0.0237 -0.0501 0.0624 -0.0729 ...
##  $ CTRL_Y_RND1_2_MED  : num  0.028 0.0111 -0.0088 0.0645 -0.0766 ...
##  $ CTRL_Y_RND1_2_SD   : num  0.0377 0.0224 0.085 0.0423 0.032 ...
##  $ AA_Y_RND1_R1       : num  0.0672 0.0106 0.1785 0.272 -2.1 ...
##  $ AA_Y_RND1_R2       : num  -0.3832 0.0102 0.1196 0.232 -1.1873 ...
##  $ AA_Y_RND1_R3       : num  -0.1599 -0.0465 -0.0381 0.1036 -1.0143 ...
##  $ AA_Y_RND2_R1       : num  0.0503 0.3083 -0.2434 0.0968 -0.4223 ...
##  $ AA_Y_RND2_R2       : num  -0.0505 0.4391 -0.2526 -0.094 -0.304 ...
##  $ AA_Y_RND2_R3       : num  -0.00342 0.52795 -0.20049 -0.02924 -0.26528 ...
##  $ AA_Y_RND1_MEAN     : num  -0.15864 -0.00857 0.08665 0.20251 -1.43385 ...
##  $ AA_Y_RND2_MEAN     : num  -0.00121 0.42511 -0.23216 -0.00879 -0.33052 ...
##  $ AA_Y_RND1_SD       : num  0.2252 0.0329 0.112 0.088 0.5834 ...
##  $ AA_Y_RND2_SD       : num  0.0504 0.1105 0.0278 0.097 0.0818 ...
##  $ AA_Y_RND1_2_MEAN   : num  -0.0799 0.2083 -0.0728 0.0969 -0.8822 ...
##  $ AA_Y_RND1_2_MED    : num  -0.027 0.159 -0.119 0.1 -0.718 ...
##  $ AA_Y_RND1_2_SD     : num  0.17 0.248 0.189 0.142 0.71 ...
##  $ LPI_Y_RND1_R1      : num  0.0122 -0.0467 0.1595 0.1942 -1.9849 ...
##  $ LPI_Y_RND1_R2      : num  -0.3963 -0.0369 0.1066 0.1757 -1.1149 ...
##  $ LPI_Y_RND1_R3      : num  -0.1905 -0.0575 -0.0145 -0.0172 -0.9972 ...
##  $ LPI_Y_RND2_R1      : num  0.0104 0.297 -0.1012 0.0363 -0.3547 ...
##  $ LPI_Y_RND2_R2      : num  -0.0758 0.4325 -0.0797 -0.0842 -0.2196 ...
##  $ LPI_Y_RND2_R3      : num  0.0494 0.519 -0.2065 -0.0978 -0.1844 ...
##  $ LPI_Y_RND1_MEAN    : num  -0.1915 -0.047 0.0839 0.1175 -1.3657 ...
##  $ LPI_Y_RND2_MEAN    : num  -0.00536 0.41618 -0.12915 -0.04856 -0.2529 ...
##  $ LPI_Y_RND1_SD      : num  0.2042 0.0103 0.0892 0.1171 0.5395 ...
##  $ LPI_Y_RND2_SD      : num  0.0641 0.1119 0.0678 0.0738 0.0899 ...
##  $ LPI_Y_RND1_2_MEAN  : num  -0.0984 0.1846 -0.0226 0.0345 -0.8093 ...
##  $ LPI_Y_RND1_2_MED   : num  -0.03273 0.13006 -0.04712 0.00953 -0.67593 ...
##  $ LPI_Y_RND1_2_SD    : num  0.169 0.263 0.137 0.126 0.701 ...

PERFORM STATISTICAL ANALYSIS

Multiple statistical method was applied to identify the best fit statistical model for this dataset. We start with the complete dataset and give it a new name to avoid distorting the original dataset.

Analysis_Final <- Analysis_CRISPRi_aa_Complete

METHOD 1

For METHOD 1, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) in the two independent experimental rounds (n=2) to the mean phenotypic performance of all the CRISPRi strains that falls within the interquartile range (IQR) of the complete dataset would be zero, and any difference within the IQR to be just by chance.

Null Hypothesis : µ(µLPI_GT_StrainX_Round1, µLPI_GT_StrainX_Round2)- µ(InterquartileRange_LPI_GT) = 0

RECALCULATION OF SOME PHENOTYPIC PARAMETERS

In this method, we estimate the Mean / standard deviation (SD) of the LPI GT of Round 1 and Round 2 separately for each strain. When one/two of the three replicates of a strain in a round returned missing value (i.e. NA), then the mean / SD of LPI GT for that round is calculated by taking average of the non NA replicates. Therefore, excluding the missing values the mean and SD statistics were recalculated. We implemented a if else decision tree for this

  • The mean and SD of Normalized generation time (LSC GT mean) at Basal condition re-calculation
for(i in 1:nrow(Analysis_Final)){
  x1 <- as.numeric(Analysis_Final[i, 9:11][which(!is.na(Analysis_Final[i, 9:11]))])
  x2 <- as.numeric(Analysis_Final[i, 12:14][which(!is.na(Analysis_Final[i, 12:14]))])
  if(length(x1)==0){
    Analysis_Final$CTRL_GT_RND1_MEAN[i] <- NA
  } else{
    Analysis_Final$CTRL_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
  }
  if(length(x2)==0){
    Analysis_Final$CTRL_GT_RND2_MEAN[i] <- NA
  } else{
    Analysis_Final$CTRL_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
  }
  if(sum(is.na(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))==0){
    Analysis_Final$CTRL_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))
    Analysis_Final[i, 87] <- as.numeric(sd(c(Analysis_Final$CTRL_GT_RND1_MEAN[i], Analysis_Final$CTRL_GT_RND2_MEAN[i])))
  } else{
    Analysis_Final$CTRL_GT_RND1_2_MEAN[i] <- NA
    Analysis_Final[i, 87] <- NA
  }
}
colnames(Analysis_Final)[87] <- "CTRL_GT_MEAN_RND1_2_SD"
  • The mean and SD of Normalized generation time (LSC GT mean) at 150mM acetic acid re-calculation
for(i in 1:nrow(Analysis_Final)){
  x1 <- as.numeric(Analysis_Final[i, 22:24][which(!is.na(Analysis_Final[i, 22:24]))])
  x2 <- as.numeric(Analysis_Final[i, 25:27][which(!is.na(Analysis_Final[i, 25:27]))])
  if(length(x1)==0){
    Analysis_Final$AA_GT_RND1_MEAN[i] <- NA
  } else{
    Analysis_Final$AA_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
  }
  if(length(x2)==0){
    Analysis_Final$AA_GT_RND2_MEAN[i] <- NA
  } else{
    Analysis_Final$AA_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
  }
  if(sum(is.na(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))==0){
    Analysis_Final$AA_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))
    Analysis_Final[i, 88] <- as.numeric(sd(c(Analysis_Final$AA_GT_RND1_MEAN[i], Analysis_Final$AA_GT_RND2_MEAN[i])))
  } else{
    Analysis_Final$AA_GT_RND1_2_MEAN[i] <- NA
    Analysis_Final[i, 88] <- NA
  }
}
colnames(Analysis_Final)[88] <- "AA_GT_MEAN_RND1_2_SD"
  • The mean and SD of RELATIVE generation time (LPI GT mean) at 150mM acetic acid re-calculation
for(i in 1:nrow(Analysis_Final)){
  x1 <- as.numeric(Analysis_Final[i, 35:37][which(!is.na(Analysis_Final[i, 35:37]))])
  x2 <- as.numeric(Analysis_Final[i, 38:40][which(!is.na(Analysis_Final[i, 38:40]))])
  if(length(x1)==0){
    Analysis_Final$LPI_GT_RND1_MEAN[i] <- NA
  } else{
    Analysis_Final$LPI_GT_RND1_MEAN[i] <- as.numeric(mean(x1))
  }
  if(length(x2)==0){
    Analysis_Final$LPI_GT_RND2_MEAN[i] <- NA
  } else{
    Analysis_Final$LPI_GT_RND2_MEAN[i] <- as.numeric(mean(x2))
  }
  if(sum(is.na(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))==0){
    Analysis_Final$LPI_GT_RND1_2_MEAN[i] <- as.numeric(mean(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))
    Analysis_Final[i, 89] <- as.numeric(sd(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))
  } else{
    Analysis_Final$LPI_GT_RND1_2_MEAN[i] <- NA
    Analysis_Final[i, 89] <- NA
  }
}
colnames(Analysis_Final)[89] <- "LPI_GT_MEAN_RND1_2_SD"
EXTRACT ALL LPI GT MEAN DATA POINTS WITHIN INTER-QUARTILE-RANGE (IQR)

BOX PLOT - MEAN RELATIVE GENERATION TIME (LPI GT)

Figure 2: Boxplot of mean relative generation time (LPI GT) for all strains in the library

Figure 2: Boxplot of mean relative generation time (LPI GT) for all strains in the library

Display Box-plot statistics

box_stat_LPI_GT_R1_2_mean$stats
##             [,1]
## [1,] -0.16933911
## [2,] -0.02428792
## [3,]  0.02084505
## [4,]  0.07255828
## [5,]  0.21771505
  • 25th Percentile = -0.02428792
  • 75th Percentile = 0.07255828

Therefore, extraction of the data points within IQR

Intermediate_50 <- Analysis_Final$LPI_GT_RND1_2_MEAN[which(Analysis_Final$LPI_GT_RND1_2_MEAN>=-0.02428792
                                                           &Analysis_Final$LPI_GT_RND1_2_MEAN<=0.07255828)]
summary(Intermediate_50)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -0.0242879 -0.0009547  0.0208382  0.0219831  0.0445144  0.0725032
ESTIMATE P-VALUE

P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)

for(i in 1:nrow(Analysis_Final)){
  if(sum(is.na(c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i])))==0){
    P_value <- t.test(Intermediate_50, c(Analysis_Final$LPI_GT_RND1_MEAN[i], Analysis_Final$LPI_GT_RND2_MEAN[i]))
    Analysis_Final[i, 90] <- P_value$p.value
  } else{
    Analysis_Final[i, 90] <- NA
  }
}
colnames(Analysis_Final)[90] <- "P_value_M1"
FALSE DISCOVERY RATE ADJUSTMENT OF P-VALUE

P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method

Analysis_Final[which(!is.na(Analysis_Final$P_value_M1)), 91] <- p.adjust(Analysis_Final$P_value_M1[which(!is.na(Analysis_Final$P_value_M1))], 
                                                                      method = "BH", 
                                                                      n = length(Analysis_Final$P_value_M1[which(!is.na(Analysis_Final$P_value_M1))]))
colnames(Analysis_Final)[91] <- "P.adjusted_M1"
P-VALUE DISGNOSTICS FOR METHOD1

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final$P_value_M1[which(Analysis_Final$P_value_M1<=0.05)])
## [1] 434
length(Analysis_Final$P.adjusted_M1[which(Analysis_Final$P.adjusted_M1<=0.05)])
## [1] 66
length(Analysis_Final$P_value_M1[which(Analysis_Final$P_value_M1<=0.1)])
## [1] 842
length(Analysis_Final$P.adjusted_M1[which(Analysis_Final$P.adjusted_M1<=0.1)])
## [1] 71

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS

Figure 3: P-value diagnostic by histogram Method 1

Figure 3: P-value diagnostic by histogram Method 1

CONCLUSIONS METHOD 1

Method 1 was too rigid as n=2. Even the smallest standard deviation between round1 and round2 is making an observation insignificant. This method puts the whole weightage on the variability between round1 and round2, not on the deviation from the mean of intermediate 50%. Therefore statistical method 1 was discarded after careful evaluation.

METHOD 2

For METHOD 2, We hypothesized that the difference between the mean(µ) phenotypic performance (LPI GT) of a specific CRISPRi strain (StrainX) in a independent experimental round (each has three technical replicates, i.e. n=3) to the mean phenotypic performance of all the replicates of the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) in that respective screening round would be zero, and any difference within the CRISPRi control strain phenotypic performance range (LPI GT range) to be just by chance.

Null Hypothesis : µStrainX(LPI_GTReplica1, LPI_GTReplica2, LPI_GTReplica3)- µCRISPRi_Control_Strains(LPI_GT) = 0

In this method P-values for each strain were estimated for each round and only strain that showed significant performance in both round were considered for further analysis

First we clone the dataset in a new name to avoid any distortion down the line

Analysis_Final_2 <- Analysis_Final
str(Analysis_Final_2)
## 'data.frame':    9078 obs. of  91 variables:
##  $ gRNA_name             : chr  "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
##  $ Seq                   : chr  "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
##  $ SOURCEPLATEID         : chr  "R2877.H.023" "R2877.H.024" "R2877.H.023" "R2877.H.023" ...
##  $ SOURCECOLONYCOLUMN    : int  5 21 22 20 21 18 6 7 9 8 ...
##  $ SOURCECOLONYROW       : chr  "O" "L" "P" "N" ...
##  $ GENE                  : chr  "AAR2" "AAR2" "AAR2" "AAR2" ...
##  $ Control.gRNA          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536         : chr  "AC9" "W41" "AE43" "AA39" ...
##  $ CTRL_GT_RND1_R1       : num  0.00397 0.00586 0.10456 -0.01142 0.0174 ...
##  $ CTRL_GT_RND1_R2       : num  -0.0086 0.00166 0.07118 0.00256 0.04094 ...
##  $ CTRL_GT_RND1_R3       : num  -0.000625 -0.043918 -0.009614 0.007969 -0.014018 ...
##  $ CTRL_GT_RND2_R1       : num  0.0431 -0.0336 0.0292 0.0381 0.0742 ...
##  $ CTRL_GT_RND2_R2       : num  -0.0165 0.00744 0.01433 0.06599 -0.02973 ...
##  $ CTRL_GT_RND2_R3       : num  0.03266 -0.04468 -0.04455 0.01994 -0.00924 ...
##  $ CTRL_GT_RND1_MEAN     : num  -0.00175 -0.01213 0.05538 -0.0003 0.01477 ...
##  $ CTRL_GT_RND2_MEAN     : num  0.019756 -0.023599 -0.000341 0.041342 0.011747 ...
##  $ CTRL_GT_RND1_SD       : num  0.00636 0.02761 0.05871 0.01001 0.02757 ...
##  $ CTRL_GT_RND2_SD       : num  0.0318 0.0275 0.039 0.0232 0.0551 ...
##  $ CTRL_GT_RND1_2_MEAN   : num  0.009 -0.0179 0.0275 0.0205 0.0133 ...
##  $ CTRL_GT_RND1_2_MED    : num  0.00167 -0.01595 0.02176 0.01395 0.00408 ...
##  $ CTRL_GT_RND1_2_SD     : num  0.0237 0.0254 0.054 0.0278 0.039 ...
##  $ AA_GT_RND1_R1         : num  -0.0239 0.06 -0.0178 -0.1839 0.5993 ...
##  $ AA_GT_RND1_R2         : num  -0.0311 0.0763 -0.0145 -0.1293 0.2619 ...
##  $ AA_GT_RND1_R3         : num  0.0315 -0.0191 -0.0778 -0.0514 0.3276 ...
##  $ AA_GT_RND2_R1         : num  0.0142 -0.1375 0.111 0.0105 0.2021 ...
##  $ AA_GT_RND2_R2         : num  0.0383 -0.2266 0.0588 0.0966 0.1831 ...
##  $ AA_GT_RND2_R3         : num  0.0265 -0.2298 0.0789 0.0787 0.0114 ...
##  $ AA_GT_RND1_MEAN       : num  -0.00781 0.03905 -0.03671 -0.12151 0.39624 ...
##  $ AA_GT_RND2_MEAN       : num  0.0263 -0.198 0.0829 0.0619 0.1322 ...
##  $ AA_GT_RND1_SD         : num  0.0343 0.051 0.0356 0.0666 0.1789 ...
##  $ AA_GT_RND2_SD         : num  0.0121 0.0524 0.0263 0.0455 0.105 ...
##  $ AA_GT_RND1_2_MEAN     : num  0.00925 -0.07945 0.02308 -0.02979 0.2642 ...
##  $ AA_GT_RND1_2_MED      : num  0.0203 -0.0783 0.0221 -0.0205 0.232 ...
##  $ AA_GT_RND1_2_SD       : num  0.0296 0.1378 0.0712 0.1127 0.1953 ...
##  $ LPI_GT_RND1_R1        : num  -0.0278 0.0541 -0.1224 -0.1724 0.5819 ...
##  $ LPI_GT_RND1_R2        : num  -0.0225 0.0746 -0.0857 -0.1318 0.2209 ...
##  $ LPI_GT_RND1_R3        : num  0.0322 0.0248 -0.0682 -0.0594 0.3416 ...
##  $ LPI_GT_RND2_R1        : num  -0.0289 -0.1039 0.0818 -0.0276 0.1279 ...
##  $ LPI_GT_RND2_R2        : num  0.0548 -0.2341 0.0444 0.0306 0.2128 ...
##  $ LPI_GT_RND2_R3        : num  -0.0062 -0.1851 0.1234 0.0588 0.0206 ...
##  $ LPI_GT_RND1_MEAN      : num  -0.00605 0.05119 -0.09208 -0.12121 0.38147 ...
##  $ LPI_GT_RND2_MEAN      : num  0.00656 -0.17435 0.08321 0.02058 0.12042 ...
##  $ LPI_GT_RND1_SD        : num  0.0332 0.025 0.0276 0.0573 0.1837 ...
##  $ LPI_GT_RND2_SD        : num  0.0433 0.0657 0.0395 0.0441 0.0963 ...
##  $ LPI_GT_RND1_2_MEAN    : num  0.000251 -0.061582 -0.004437 -0.050313 0.250944 ...
##  $ LPI_GT_RND1_2_MED     : num  -0.0143 -0.0396 -0.0119 -0.0435 0.2169 ...
##  $ LPI_GT_RND1_2_SD      : num  0.0352 0.1313 0.1007 0.0901 0.1941 ...
##  $ CTRL_Y_RND1_R1        : num  0.055 0.0573 0.0189 0.0779 -0.1151 ...
##  $ CTRL_Y_RND1_R2        : num  0.0131 0.0472 0.0129 0.0562 -0.0723 ...
##  $ CTRL_Y_RND1_R3        : num  0.0306 0.0109 -0.0236 0.1208 -0.0171 ...
##  $ CTRL_Y_RND2_R1        : num  0.0399 0.0113 -0.1422 0.0605 -0.0676 ...
##  $ CTRL_Y_RND2_R2        : num  0.02531 0.0066 -0.17285 -0.00976 -0.08442 ...
##  $ CTRL_Y_RND2_R3        : num  -0.0528 0.0089 0.00598 0.06854 -0.08083 ...
##  $ CTRL_Y_RND1_MEAN      : num  0.03288 0.03847 0.00276 0.08497 -0.06817 ...
##  $ CTRL_Y_RND2_MEAN      : num  0.00414 0.00893 -0.10301 0.03977 -0.07762 ...
##  $ CTRL_Y_RND1_SD        : num  0.0211 0.0244 0.023 0.0329 0.0491 ...
##  $ CTRL_Y_RND2_SD        : num  0.04985 0.00234 0.09563 0.04308 0.00885 ...
##  $ CTRL_Y_RND1_2_MEAN    : num  0.0185 0.0237 -0.0501 0.0624 -0.0729 ...
##  $ CTRL_Y_RND1_2_MED     : num  0.028 0.0111 -0.0088 0.0645 -0.0766 ...
##  $ CTRL_Y_RND1_2_SD      : num  0.0377 0.0224 0.085 0.0423 0.032 ...
##  $ AA_Y_RND1_R1          : num  0.0672 0.0106 0.1785 0.272 -2.1 ...
##  $ AA_Y_RND1_R2          : num  -0.3832 0.0102 0.1196 0.232 -1.1873 ...
##  $ AA_Y_RND1_R3          : num  -0.1599 -0.0465 -0.0381 0.1036 -1.0143 ...
##  $ AA_Y_RND2_R1          : num  0.0503 0.3083 -0.2434 0.0968 -0.4223 ...
##  $ AA_Y_RND2_R2          : num  -0.0505 0.4391 -0.2526 -0.094 -0.304 ...
##  $ AA_Y_RND2_R3          : num  -0.00342 0.52795 -0.20049 -0.02924 -0.26528 ...
##  $ AA_Y_RND1_MEAN        : num  -0.15864 -0.00857 0.08665 0.20251 -1.43385 ...
##  $ AA_Y_RND2_MEAN        : num  -0.00121 0.42511 -0.23216 -0.00879 -0.33052 ...
##  $ AA_Y_RND1_SD          : num  0.2252 0.0329 0.112 0.088 0.5834 ...
##  $ AA_Y_RND2_SD          : num  0.0504 0.1105 0.0278 0.097 0.0818 ...
##  $ AA_Y_RND1_2_MEAN      : num  -0.0799 0.2083 -0.0728 0.0969 -0.8822 ...
##  $ AA_Y_RND1_2_MED       : num  -0.027 0.159 -0.119 0.1 -0.718 ...
##  $ AA_Y_RND1_2_SD        : num  0.17 0.248 0.189 0.142 0.71 ...
##  $ LPI_Y_RND1_R1         : num  0.0122 -0.0467 0.1595 0.1942 -1.9849 ...
##  $ LPI_Y_RND1_R2         : num  -0.3963 -0.0369 0.1066 0.1757 -1.1149 ...
##  $ LPI_Y_RND1_R3         : num  -0.1905 -0.0575 -0.0145 -0.0172 -0.9972 ...
##  $ LPI_Y_RND2_R1         : num  0.0104 0.297 -0.1012 0.0363 -0.3547 ...
##  $ LPI_Y_RND2_R2         : num  -0.0758 0.4325 -0.0797 -0.0842 -0.2196 ...
##  $ LPI_Y_RND2_R3         : num  0.0494 0.519 -0.2065 -0.0978 -0.1844 ...
##  $ LPI_Y_RND1_MEAN       : num  -0.1915 -0.047 0.0839 0.1175 -1.3657 ...
##  $ LPI_Y_RND2_MEAN       : num  -0.00536 0.41618 -0.12915 -0.04856 -0.2529 ...
##  $ LPI_Y_RND1_SD         : num  0.2042 0.0103 0.0892 0.1171 0.5395 ...
##  $ LPI_Y_RND2_SD         : num  0.0641 0.1119 0.0678 0.0738 0.0899 ...
##  $ LPI_Y_RND1_2_MEAN     : num  -0.0984 0.1846 -0.0226 0.0345 -0.8093 ...
##  $ LPI_Y_RND1_2_MED      : num  -0.03273 0.13006 -0.04712 0.00953 -0.67593 ...
##  $ LPI_Y_RND1_2_SD       : num  0.169 0.263 0.137 0.126 0.701 ...
##  $ CTRL_GT_MEAN_RND1_2_SD: num  0.01521 0.00811 0.0394 0.02945 0.00214 ...
##  $ AA_GT_MEAN_RND1_2_SD  : num  0.0241 0.1676 0.0846 0.1297 0.1867 ...
##  $ LPI_GT_MEAN_RND1_2_SD : num  0.00892 0.15948 0.12395 0.10026 0.18459 ...
##  $ P_value_M1            : num  0.178 0.594 0.814 0.494 0.33 ...
##  $ P.adjusted_M1         : num  0.91 0.911 0.952 0.91 0.91 ...
EXTRACT CRISPRi CONTROL STRAINS DATA

Extract the CRISPRi-control strains LPI GT data from ROUND1 and ROUND2, respectively and store the output in two different vectors.

  • ROUND 1
CRISPRi_Ctrl_Round1 <- whole_data_CRISPRi_aa_2$LPI_GT[which(whole_data_CRISPRi_aa_2$Control.gRNA == 1 
                                                            & whole_data_CRISPRi_aa_2$Round_ID=="1st_round")]
summary(CRISPRi_Ctrl_Round1)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.06721  0.09108  0.16819  0.15842  0.20944  0.38829
  • ROUND 2
CRISPRi_Ctrl_Round2 <- whole_data_CRISPRi_aa_2$LPI_GT[which(whole_data_CRISPRi_aa_2$Control.gRNA == 1 
                                                           & whole_data_CRISPRi_aa_2$Round_ID=="2nd_round")]
summary(CRISPRi_Ctrl_Round2)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.21609 -0.07291 -0.02904 -0.01828  0.04256  0.15629
ESTIMATE P-VALUES FOR ROUND 1 AND 2

P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)

for(i in 1:nrow(Analysis_Final_2)){
  test1 <- t(Analysis_Final_2[i, 35:37])
  test2 <- t(Analysis_Final_2[i, 38:40])
  if(sum(!is.na(test1[, 1]))>=2){
    P_value_RND1 <- t.test(CRISPRi_Ctrl_Round1, test1[which(!is.na(test1[, 1]))])
    Analysis_Final_2[i, 92] <- P_value_RND1$p.value
  } else {
    Analysis_Final_2[i, 92] <- NA
  }
  if(sum(!is.na(test2[, 1]))>=2){
    P_value_RND2 <- t.test(CRISPRi_Ctrl_Round2, test2[which(!is.na(test2[, 1]))])
    Analysis_Final_2[i, 93] <- P_value_RND2$p.value
  } else {
    Analysis_Final_2[i, 93] <- NA
  }
}
colnames(Analysis_Final_2)[92:93] <- c("P_value_RND1_M2", "P_value_RND2_M2")
FALSE DISCOVERY RATE ADJUSTMENT OF P-VALUES FOR ROUND 1 AND 2

P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method

Analysis_Final_2[which(!is.na(Analysis_Final_2$P_value_RND1_M2)), 94] <- p.adjust(Analysis_Final_2$P_value_RND1_M2[which(!is.na(Analysis_Final_2$P_value_RND1_M2))], 
                                                                                  method = "BH", 
                                                                                  n = length(Analysis_Final_2$P_value_RND1_M2[which(!is.na(Analysis_Final_2$P_value_RND1_M2))]))

Analysis_Final_2[which(!is.na(Analysis_Final_2$P_value_RND2_M2)), 95] <- p.adjust(Analysis_Final_2$P_value_RND2_M2[which(!is.na(Analysis_Final_2$P_value_RND2_M2))], 
                                                                                  method = "BH", 
                                                                                  n = length(Analysis_Final_2$P_value_RND2_M2[which(!is.na(Analysis_Final_2$P_value_RND2_M2))]))

colnames(Analysis_Final_2)[94:95] <- c("P.adjusted_RND1_M2", "P.adjusted_RND2_M2")
P-VALUE DISGNOSTICS FOR METHOD 2 : ROUND1

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_2$P_value_RND1_M2[which(Analysis_Final_2$P_value_RND1_M2<=0.05)])
## [1] 4601
length(Analysis_Final_2$P.adjusted_RND1_M2[which(Analysis_Final_2$P.adjusted_RND1_M2<=0.05)])
## [1] 3389
length(Analysis_Final_2$P_value_RND1_M2[which(Analysis_Final_2$P_value_RND1_M2<=0.1)])
## [1] 5635
length(Analysis_Final_2$P.adjusted_RND1_M2[which(Analysis_Final_2$P.adjusted_RND1_M2<=0.1)])
## [1] 4692

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS ROUND 1

Figure 4: P-value diagnostic by histogram, Method 2, Round 1

Figure 4: P-value diagnostic by histogram, Method 2, Round 1

P-VALUE DISGNOSTICS FOR METHOD 2 : ROUND2

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_2$P_value_RND2_M2[which(Analysis_Final_2$P_value_RND2_M2<=0.05)])
## [1] 2304
length(Analysis_Final_2$P.adjusted_RND2_M2[which(Analysis_Final_2$P.adjusted_RND2_M2<=0.05)])
## [1] 987
length(Analysis_Final_2$P_value_RND2_M2[which(Analysis_Final_2$P_value_RND2_M2<=0.1)])
## [1] 3174
length(Analysis_Final_2$P.adjusted_RND2_M2[which(Analysis_Final_2$P.adjusted_RND2_M2<=0.1)])
## [1] 1431

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS ROUND 2

Figure 5: P-value diagnostic by histogram, Method 2, Round 2

Figure 5: P-value diagnostic by histogram, Method 2, Round 2

CONCLUSIONS METHOD 2

It is a robust statistical method. However, one of the major problem with this method is setting different thresholds for p.adjusted values and LPI GT Mean for each round.

METHOD 3 AND METHOD 4

For METHOD 3, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all technical replicates (3 in each) in the two independent experimental rounds (i.e. n=6) to the mean phenotypic performance of all the CRISPRi strains that falls within the interquartile range (IQR) of the complete dataset would be zero, and any difference within the IQR to be just by chance.

Null Hypothesis : µStrainX(All_replicates_LPI_GT)- µ(InterquartileRange_LPI_GT) = 0

Additionally, we tested one final statistical model to determine significance of our observations

For METHOD 4, We hypothesized that the difference between the mean(µ) phenotypic performance of a specific CRISPRi strain (StrainX) considering all technical replicates (3 in each) in the two independent experimental rounds (i.e. n=6) to the mean phenotypic performance of all the CRISPRi control strains (with gRNA targeting no genetic locus in S. cerevisiae) would be zero, and any difference within the CRISPRi control strains phenotypic performance range (LPI GT range) to be just by chance.

Null Hypothesis : µStrainX(All_replicates_LPI_GT) - µCRISPRi_Control_Strains(LPI_GT) = 0

To ensure that we don’t distort the original dataset we clone the Analysis dataset in a new name

Analysis_Final_3 <- Analysis_Final_2
EXTRACT ALL LPI GT DATA POINTS (INCLUDING ALL REPLICATES) WITHIN INTER-QUARTILE-RANGE (IQR)

Since we will consider all replicates this time, we will compare it with all replicates (NOT MEAN) that falls within IQR for Method 3. For this purpose, we extract the IQR dataset including all the replicate data for each strain. We will use the data.frame Data_CRISPRi_aa (see, REMOVE ROWS WITH SPATIAL CONTROL STRAIN DATA) to extract this numeric vector.

BOX PLOT - RELATIVE GENERATION TIME (LPI GT)

Figure 6: Boxplot of relative generation time (LPI GT) for all strains including all replicates in the library

Figure 6: Boxplot of relative generation time (LPI GT) for all strains including all replicates in the library

Display Box-plot statistics

boxplot_stat_LPI_GT$stats 
##             [,1]
## [1,] -0.25630118
## [2,] -0.04373846
## [3,]  0.02045212
## [4,]  0.09804938
## [5,]  0.31050530
  • 25th Percentile = -0.04373846
  • 75th Percentile = 0.09804938

Therefore, extraction of the data points within IQR

Intermediate_50_M3 <- Data_CRISPRi_aa$LPI_GT[which(Data_CRISPRi_aa$LPI_GT >=-0.04373846
                                                   &Data_CRISPRi_aa$LPI_GT<=0.09804938)]
summary(Intermediate_50_M3)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.04374 -0.01004  0.02045  0.02236  0.05325  0.09805
EXTRACT CRISPRi CONTROL STRAINS DATA (ALL REPLICATES)

This time we extract all the replicate data (non the mean) of each of the CRISPRi control strains for the Method 4

Crispri_control_M4 <- Data_CRISPRi_aa$LPI_GT[which(Data_CRISPRi_aa$Control.gRNA==1)]
summary(Crispri_control_M4)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.21609 -0.03042  0.06889  0.07007  0.16628  0.38829
RECALCULATE THE LSC GT MEAN AT BASAL CONDITION AND LPI GT MEAN OF EACH STRAIN

We recalculate the above parameter taking all six replicates into account and excluding the missing values. For this purpose we use a if else decision tree. This means we get a LSC GT / LPI GT value if at-least 1 replicate managed to grow at a particular condition. Else it will return a missing value or NA

Additionally, we also create two columns that shows number of replicates of a strain managed to grow in basal condition (n_CTRL) and number of replicates in acetic acid condition (n_LPI)

for(i in 1:nrow(Analysis_Final_3)){
  test1 <- t(Analysis_Final_3[i, 9:14])
  test2 <- t(Analysis_Final_3[i, 35:40])
  x1 <- sum(!is.na(test1[, 1]))
  x2 <- sum(!is.na(test2[, 1]))
  CTRL_GT_Mean_temp <- mean(test1[which(!is.na(test1[, 1]))])
  LPI_GT_Mean_temp <- mean(test2[which(!is.na(test2[, 1]))])
  Analysis_Final_3[i, 96] <- CTRL_GT_Mean_temp
  Analysis_Final_3[i, 97] <- x1
  Analysis_Final_3[i, 98] <- LPI_GT_Mean_temp
  Analysis_Final_3[i, 99] <- x2
}
colnames(Analysis_Final_3)[96:99] <- c("CTRL_GT_Mean_all", "n_CTRL", "LPI_GT_Mean_all", "n_LPI")
ESTIMATE P-VALUES FOR METHOD 3 AND 4

P-value is estimated by Welch two sample two-sided t-test (an adaptation of Student’s t-test)

for(i in 1:nrow(Analysis_Final_3)){
  test <- t(Analysis_Final_3[i, 35:40])
  x <- sum(!is.na(test[, 1]))
  if(x>2){
    P.value_temp_M3 <- t.test(Intermediate_50_M3, test[which(!is.na(test[, 1]))])
    P.value_temp_M4 <- t.test(Crispri_control_M4, test[which(!is.na(test[, 1]))])
    Analysis_Final_3[i, 100] <- P.value_temp_M3$p.value
    Analysis_Final_3[i, 101] <- P.value_temp_M4$p.value
  } else {
    Analysis_Final_3[i, 100] <- NA
    Analysis_Final_3[i, 101] <- NA
  }
}
colnames(Analysis_Final_3)[100:101] <- c("P.value_M3", "P.value_M4")
FALSE DISCOVERY RATE ADJUSTMENT OF P-VALUES FOR METHOD 3 AND 4

P-value adjustment by BENJAMINI-HOCHBERG False Discovery Rate (FDR) method

Analysis_Final_3[which(!is.na(Analysis_Final_3$P.value_M3)), 102] <- p.adjust(Analysis_Final_3$P.value_M3[which(!is.na(Analysis_Final_3$P.value_M3))], 
                                                                              method = "BH", 
                                                                              n = length(Analysis_Final_3$P.value_M3[which(!is.na(Analysis_Final_3$P.value_M3))]))
Analysis_Final_3[which(!is.na(Analysis_Final_3$P.value_M4)), 103] <- p.adjust(Analysis_Final_3$P.value_M4[which(!is.na(Analysis_Final_3$P.value_M4))], 
                                                                              method = "BH", 
                                                                              n = length(Analysis_Final_3$P.value_M4[which(!is.na(Analysis_Final_3$P.value_M4))]))
colnames(Analysis_Final_3)[102:103] <- c("P.adjusted_M3", "P.adjusted_M4")
P-VALUE DISGNOSTICS FOR METHOD 3

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_3$P.value_M3[which(Analysis_Final_3$P.value_M3<=0.05)])
## [1] 2468
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3<=0.05)])
## [1] 514
length(Analysis_Final_3$P.value_M3[which(Analysis_Final_3$P.value_M3<=0.1)])
## [1] 3392
length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3<=0.1)])
## [1] 1258

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS

Figure 7 (Fig S11 in Manuscript): P-value diagnostic by histogram, Method 3

Figure 7 (Fig S11 in Manuscript): P-value diagnostic by histogram, Method 3

P-VALUE DISGNOSTICS FOR METHOD 4

NUMBER OF SIGNIFICANT STRAINS

length(Analysis_Final_3$P.value_M4[which(Analysis_Final_3$P.value_M4<=0.05)])
## [1] 3663
length(Analysis_Final_3$P.adjusted_M4[which(Analysis_Final_3$P.adjusted_M4<=0.05)])
## [1] 2212
length(Analysis_Final_3$P.value_M4[which(Analysis_Final_3$P.value_M4<=0.1)])
## [1] 4545
length(Analysis_Final_3$P.adjusted_M4[which(Analysis_Final_3$P.adjusted_M4<=0.1)])
## [1] 3306

P-VALUE DIAGNOSTICS BY HISTOGRAM ANALYSIS

Figure 8: P-value diagnostic by histogram, Method 4

Figure 8: P-value diagnostic by histogram, Method 4

CONCLUSIONS METHOD 3

P.values generated by Method 3 can be corrected efficiently using the FDR method and after the correction the P.adjusted values have nearly equal distribution, which is indicative of a robust statistical outcome. Therefore Method 3 is a good statistical method for this dataset.

CONCLUSIONS METHOD 4

Although Method 4 is effective to identify candidates deviated most from the CRISPRi control means, but the FDR method is less effective on the generated P.value. Therefore, less efficient for the current dataset. Moreover, the CRISPRi control strains for some reason consistently displayed a slower growth under acetic acid compared to the mean of the population. This resulted a bias for method 4 in candidate selection.

FINAL CONCLUSION FOR STATISTICAL ANALYSIS

Out of the 4 statistical methods evaluated, METHOD 3 was the most promising method to identify the significant candidates. Therefore, for this study we considered the results of statistical Method 3 for further downstream analysis.

SETTING THE STATISTICAL AND EFFECTSIZE THRESHOLD

Number of strains with Adjusted P-value ≤ 0.1

length(Analysis_Final_3$P.adjusted_M3[which(Analysis_Final_3$P.adjusted_M3 <= 0.1)])
## [1] 1258

To avoid missing potential candidates just because of high variability among the replicates, we keep the adjusted P-value threshold less strict i.e. ≤ 0.1. In addition, we introduce an effect size threshold i.e. the phenotypic performance range of CRISPRi control strains.

  • Estimating the Effect size threshold to identify acetic acid sensitive candidates
max(Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)])
## [1] 0.165662

Therefore, any strain that have an adjusted P-value ≤ 0.1 AND mean LPI GT > 0.165662 will be considered SENSITIVE to acetic acid

  • Estimating the Effect size threshold to identify acetic acid tolerant candidates
min(Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)])
## [1] -0.03680838

Therefore, any strain that have an adjusted P-value ≤ 0.1 AND mean LPI GT < -0.03680838 will be considered TOLERANT to acetic acid

EXTRACT THE ACETIC ACID TOLERANT STRAINS

  • Extract the row index that satisfy the statistical (adjusted P-value ≤ 0.1) and effect size (mean LPI GT < -0.03680838) criterion for acetic acid tolerant candidates
candidate_padj_0.1_FIT_M3 <- which((Analysis_Final_3$LPI_GT_Mean_all < -0.03680838 & Analysis_Final_3$P.adjusted_M3<= 0.1))
length(candidate_padj_0.1_FIT_M3)
## [1] 478

This gives 478 ACETIC ACID TOLERANT strains

  • Extract the row data of acetic acid tolerant strains
Fit_M3_complete <- Analysis_Final_3[candidate_padj_0.1_FIT_M3, ]
Fit_M3_complete <- Fit_M3_complete[order(Fit_M3_complete$LPI_GT_Mean_all, decreasing = FALSE), ]
str(Fit_M3_complete)
## 'data.frame':    478 obs. of  103 variables:
##  $ gRNA_name             : chr  "RPN9-TRg-4" "RGL1-NRg-7" "RPN9-NRg-7" "POP3-NRg-5" ...
##  $ Seq                   : chr  "ACCCGCTCCCCGCTTTCATC" "GCTCTTGTTTAGTAGGCGTG" "ACCGGATGAAAGCGGGGAGC" "CAAATATCCGCCCTGGCAAT" ...
##  $ SOURCEPLATEID         : chr  "R2877.H.021" "R2877.H.002" "R2877.H.020" "R2877.H.022" ...
##  $ SOURCECOLONYCOLUMN    : int  1 3 7 19 4 3 18 7 1 3 ...
##  $ SOURCECOLONYROW       : chr  "B" "P" "N" "C" ...
##  $ GENE                  : chr  "RPN9" "RGL1" "RPN9" "POP3" ...
##  $ Control.gRNA          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Location_1536         : chr  "C1" "AE5" "AA13" "E37" ...
##  $ CTRL_GT_RND1_R1       : num  0.07603 -0.05695 0.02247 -0.01067 0.00419 ...
##  $ CTRL_GT_RND1_R2       : num  0.08332 -0.06969 -0.02313 0.01964 0.00952 ...
##  $ CTRL_GT_RND1_R3       : num  0.16235 -0.00953 0.00817 0.06949 -0.03925 ...
##  $ CTRL_GT_RND2_R1       : num  -0.0229 -0.01756 0.08764 0.0332 0.00409 ...
##  $ CTRL_GT_RND2_R2       : num  0.01055 -0.01944 -0.0114 0.02561 0.00428 ...
##  $ CTRL_GT_RND2_R3       : num  -0.00635 -0.06164 0.0289 0.03377 -0.01342 ...
##  $ CTRL_GT_RND1_MEAN     : num  0.10723 -0.04539 0.00251 0.02615 -0.00851 ...
##  $ CTRL_GT_RND2_MEAN     : num  -0.00624 -0.03288 0.03505 0.03086 -0.00168 ...
##  $ CTRL_GT_RND1_SD       : num  0.0479 0.0317 0.0233 0.0405 0.0267 ...
##  $ CTRL_GT_RND2_SD       : num  0.01672 0.02492 0.0498 0.00455 0.01016 ...
##  $ CTRL_GT_RND1_2_MEAN   : num  0.0505 -0.0391 0.0188 0.0285 -0.0051 ...
##  $ CTRL_GT_RND1_2_MED    : num  0.04329 -0.0382 0.01532 0.0294 0.00414 ...
##  $ CTRL_GT_RND1_2_SD     : num  0.0699 0.0264 0.0391 0.0259 0.0185 ...
##  $ AA_GT_RND1_R1         : num  -0.525 -0.44 -0.386 -0.34 -0.263 ...
##  $ AA_GT_RND1_R2         : num  -0.767 -0.484 -0.488 -0.312 -0.257 ...
##  $ AA_GT_RND1_R3         : num  -0.621 -0.338 -0.4 -0.375 -0.363 ...
##  $ AA_GT_RND2_R1         : num  -0.2124 -0.1735 0.0426 -0.107 -0.2219 ...
##  $ AA_GT_RND2_R2         : num  -0.172 -0.2 -0.12 -0.176 -0.256 ...
##  $ AA_GT_RND2_R3         : num  -0.1797 -0.2696 -0.1352 -0.0916 -0.1875 ...
##  $ AA_GT_RND1_MEAN       : num  -0.638 -0.421 -0.425 -0.342 -0.294 ...
##  $ AA_GT_RND2_MEAN       : num  -0.1881 -0.2144 -0.0709 -0.1249 -0.2218 ...
##  $ AA_GT_RND1_SD         : num  0.1214 0.0746 0.055 0.0317 0.0595 ...
##  $ AA_GT_RND2_SD         : num  0.0214 0.0496 0.0985 0.045 0.0343 ...
##  $ AA_GT_RND1_2_MEAN     : num  -0.413 -0.318 -0.248 -0.234 -0.258 ...
##  $ AA_GT_RND1_2_MED      : num  -0.369 -0.304 -0.261 -0.244 -0.257 ...
##  $ AA_GT_RND1_2_SD       : num  0.2583 0.1265 0.2066 0.124 0.0588 ...
##  $ LPI_GT_RND1_R1        : num  -0.602 -0.383 -0.409 -0.329 -0.267 ...
##  $ LPI_GT_RND1_R2        : num  -0.85 -0.414 -0.465 -0.332 -0.267 ...
##  $ LPI_GT_RND1_R3        : num  -0.783 -0.329 -0.408 -0.445 -0.324 ...
##  $ LPI_GT_RND2_R1        : num  -0.1895 -0.156 -0.0451 -0.1402 -0.226 ...
##  $ LPI_GT_RND2_R2        : num  -0.183 -0.181 -0.109 -0.202 -0.26 ...
##  $ LPI_GT_RND2_R3        : num  -0.173 -0.208 -0.164 -0.125 -0.174 ...
##  $ LPI_GT_RND1_MEAN      : num  -0.745 -0.375 -0.427 -0.368 -0.286 ...
##  $ LPI_GT_RND2_MEAN      : num  -0.182 -0.181 -0.106 -0.156 -0.22 ...
##  $ LPI_GT_RND1_SD        : num  0.1285 0.0432 0.0324 0.0661 0.0328 ...
##  $ LPI_GT_RND2_SD        : num  0.00809 0.02602 0.05954 0.04043 0.04341 ...
##  $ LPI_GT_RND1_2_MEAN    : num  -0.463 -0.278 -0.267 -0.262 -0.253 ...
##  $ LPI_GT_RND1_2_MED     : num  -0.395 -0.268 -0.286 -0.265 -0.263 ...
##  $ LPI_GT_RND1_2_SD      : num  0.319 0.1109 0.1812 0.1264 0.0497 ...
##  $ CTRL_Y_RND1_R1        : num  0.1577 -0.2668 -0.0429 0.2105 0.0193 ...
##  $ CTRL_Y_RND1_R2        : num  -0.0673 -0.2514 -0.0582 0.1997 0.0202 ...
##  $ CTRL_Y_RND1_R3        : num  0.2922 -0.0261 -0.0683 0.1684 0.058 ...
##  $ CTRL_Y_RND2_R1        : num  0.0912 -0.0829 -0.0174 0.0119 0.0743 ...
##  $ CTRL_Y_RND2_R2        : num  -0.19136 -0.09411 -0.00691 0.11114 0.07309 ...
##  $ CTRL_Y_RND2_R3        : num  0.295 -0.0282 -0.0197 -0.0138 0.0828 ...
##  $ CTRL_Y_RND1_MEAN      : num  0.1275 -0.1815 -0.0565 0.1929 0.0325 ...
##  $ CTRL_Y_RND2_MEAN      : num  0.0649 -0.0684 -0.0147 0.0364 0.0767 ...
##  $ CTRL_Y_RND1_SD        : num  0.1816 0.1347 0.0128 0.0219 0.0221 ...
##  $ CTRL_Y_RND2_SD        : num  0.24423 0.03528 0.00683 0.06596 0.00529 ...
##  $ CTRL_Y_RND1_2_MEAN    : num  0.0962 -0.1249 -0.0356 0.1147 0.0546 ...
##  $ CTRL_Y_RND1_2_MED     : num  0.1245 -0.0885 -0.0313 0.1398 0.0656 ...
##  $ CTRL_Y_RND1_2_SD      : num  0.1955 0.1077 0.0247 0.0963 0.0282 ...
##  $ AA_Y_RND1_R1          : num  1.696 -0.399 1.145 1.162 0.45 ...
##  $ AA_Y_RND1_R2          : num  1.589 -0.333 1.135 1.103 0.45 ...
##  $ AA_Y_RND1_R3          : num  1.528 -0.161 1.087 1.2 0.543 ...
##  $ AA_Y_RND2_R1          : num  0.644 0.28 0.11 0.187 0.405 ...
##  $ AA_Y_RND2_R2          : num  0.279 0.368 0.185 0.18 0.416 ...
##  $ AA_Y_RND2_R3          : num  0.461 0.606 0.27 0.147 0.494 ...
##  $ AA_Y_RND1_MEAN        : num  1.604 -0.298 1.122 1.155 0.481 ...
##  $ AA_Y_RND2_MEAN        : num  0.461 0.418 0.188 0.171 0.438 ...
##  $ AA_Y_RND1_SD          : num  0.0849 0.1229 0.0308 0.0486 0.0538 ...
##  $ AA_Y_RND2_SD          : num  0.1827 0.1685 0.0796 0.021 0.0484 ...
##  $ AA_Y_RND1_2_MEAN      : num  1.0327 0.0602 0.6551 0.6631 0.4594 ...
##  $ AA_Y_RND1_2_MED       : num  1.086 0.0597 0.6782 0.645 0.4497 ...
##  $ AA_Y_RND1_2_SD        : num  0.6389 0.4137 0.5143 0.5399 0.0514 ...
##  $ LPI_Y_RND1_R1         : num  1.538 -0.132 1.187 0.952 0.43 ...
##  $ LPI_Y_RND1_R2         : num  1.6563 -0.0818 1.1928 0.9034 0.4296 ...
##  $ LPI_Y_RND1_R3         : num  1.236 -0.135 1.155 1.031 0.485 ...
##  $ LPI_Y_RND2_R1         : num  0.553 0.363 0.128 0.175 0.331 ...
##  $ LPI_Y_RND2_R2         : num  0.4699 0.4621 0.1915 0.0684 0.3426 ...
##  $ LPI_Y_RND2_R3         : num  0.166 0.634 0.289 0.161 0.411 ...
##  $ LPI_Y_RND1_MEAN       : num  1.477 -0.116 1.178 0.962 0.448 ...
##  $ LPI_Y_RND2_MEAN       : num  0.396 0.486 0.203 0.135 0.361 ...
##  $ LPI_Y_RND1_SD         : num  0.2168 0.0299 0.0203 0.0644 0.0317 ...
##  $ LPI_Y_RND2_SD         : num  0.2035 0.1371 0.0813 0.0579 0.0432 ...
##  $ LPI_Y_RND1_2_MEAN     : num  0.937 0.185 0.691 0.548 0.405 ...
##  $ LPI_Y_RND1_2_MED      : num  0.894 0.141 0.722 0.539 0.42 ...
##  $ LPI_Y_RND1_2_SD       : num  0.621 0.3419 0.537 0.4564 0.0585 ...
##  $ CTRL_GT_MEAN_RND1_2_SD: num  0.08023 0.00885 0.02301 0.00333 0.00483 ...
##  $ AA_GT_MEAN_RND1_2_SD  : num  0.3179 0.146 0.2502 0.1537 0.0511 ...
##  $ LPI_GT_MEAN_RND1_2_SD : num  0.3981 0.1371 0.2272 0.1504 0.0463 ...
##  $ P_value_M1            : num  0.3346 0.1987 0.3234 0.228 0.0754 ...
##  $ P.adjusted_M1         : num  0.91 0.91 0.91 0.91 0.91 ...
##  $ P_value_RND1_M2       : num  5.58e-03 2.47e-04 9.67e-06 2.50e-03 3.55e-05 ...
##  $ P_value_RND2_M2       : num  5.25e-17 6.93e-04 1.15e-01 1.47e-02 7.13e-03 ...
##  $ P.adjusted_RND1_M2    : num  0.02164 0.002034 0.000126 0.01246 0.000397 ...
##  $ P.adjusted_RND2_M2    : num  2.24e-14 1.12e-02 3.06e-01 9.49e-02 5.98e-02 ...
##  $ CTRL_GT_Mean_all      : num  0.0505 -0.0391 0.0188 0.0285 -0.0051 ...
##  $ n_CTRL                : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ LPI_GT_Mean_all       : num  -0.463 -0.278 -0.267 -0.262 -0.253 ...
##  $ n_LPI                 : int  6 6 6 6 6 6 6 6 6 6 ...
##   [list output truncated]

EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID TOLERANCE

  • Extract description of all genes (1617 genes) involved in this study from Saccharomyces Genome Database (SGD). A .csv file for this purpose is already exist in the COMPILED_DATA folder

Gene Description Key file :Gene_List_CRISPRi_lib.csv

whole_Gene_list_Final <- read.csv("COMPILED_DATA/Gene_List_CRISPRi_lib.csv", na.strings = "", stringsAsFactors = FALSE)
rownames(whole_Gene_list_Final) <- whole_Gene_list_Final$LIB_ID
str(whole_Gene_list_Final)
## 'data.frame':    1617 obs. of  7 variables:
##  $ LIB_ID     : chr  "AAR2" "AAT1" "AAT2" "ABD1" ...
##  $ SGD_DB_ID  : chr  "S000000170" "S000001589" "S000004017" "S000000440" ...
##  $ SYS_ID     : chr  "YBL074C" "YKL106W" "YLR027C" "YBR236C" ...
##  $ GENE_SYM   : chr  "AAR2" "AAT1" "AAT2" "ABD1" ...
##  $ NAME       : chr  "A1-Alpha2 Repression" "Aspartate AminoTransferase" "Aspartate AminoTransferase" NA ...
##  $ PHENOTYPE  : chr  "Essential gene; conditional mutant is heat sensitive, loses viability at elevated temperature and displays elev"| __truncated__ "Non-essential gene; null mutant has a reduced respiratory growth rate and decreased competitive fitness on non-"| __truncated__ "Non-essential gene in S288C, but essential in the Sigma1278b background; S288C null mutant displays a decreased"| __truncated__ "Essential gene; temperature-sensitive mutation causes decreasing protein synthesis upon temperature shift; repr"| __truncated__ ...
##  $ DESCRIPTION: chr  "Component of the U5 snRNP complex; required for splicing of U3 precursors; originally described as a splicing f"| __truncated__ "Mitochondrial aspartate aminotransferase; catalyzes the conversion of oxaloacetate to aspartate in aspartate an"| __truncated__ "Cytosolic aspartate aminotransferase involved in nitrogen metabolism; localizes to peroxisomes in oleate-grown cells" "Methyltransferase; catalyzes the transfer of a methyl group from S-adenosylmethionine to the GpppN terminus of "| __truncated__ ...
  • Next, prepare a data.frame with descriptions of CRISPRi target genes that induced acetic acid tolerance. This file also include how many gRNAs per target gene induced the acetic acid tolerance.
Fit_all_M3 <- data.frame(sort(table(Analysis_Final_3$GENE[candidate_padj_0.1_FIT_M3]), decreasing = TRUE))
y <- as.character(Fit_all_M3$Var1)
x <- whole_Gene_list_Final[y, ]
Fit_all_M3_description <- cbind(Fit_all_M3, x[, -1])
str(Fit_all_M3_description)
## 'data.frame':    370 obs. of  8 variables:
##  $ Var1       : Factor w/ 370 levels "PEP7","RPN9",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Freq       : int  5 5 5 4 3 3 3 3 3 3 ...
##  $ SGD_DB_ID  : chr  "S000002731" "S000002835" "S000001899" "S000006069" ...
##  $ SYS_ID     : chr  "YDR323C" "YDR427W" "YFR003C" "YPL148C" ...
##  $ GENE_SYM   : chr  "PEP7" "RPN9" "YPI1" "PPT2" ...
##  $ NAME       : chr  "carboxyPEPtidase Y-deficient" "Regulatory Particle Non-ATPase" "Yeast Phosphatase Inhibitor" "Phosphopantetheine:Protein Transferase" ...
##  $ PHENOTYPE  : chr  NA "Non-essential gene; null mutant is sensitive to elevated temperatures, shows cell cycle arrest in metaphase, an"| __truncated__ NA NA ...
##  $ DESCRIPTION: chr  "Adaptor protein involved in vesicle-mediated vacuolar protein sorting; multivalent adaptor protein; facilitates"| __truncated__ "Non-ATPase regulatory subunit of the 26S proteasome; similar to putative proteasomal subunits in other species;"| __truncated__ "Regulatory subunit of the type I protein phosphatase (PP1) Glc7p; Glc7p participates in the regulation of a var"| __truncated__ "Phosphopantetheine:protein transferase (PPTase); activates mitochondrial acyl carrier protein (Acp1p) by phosph"| __truncated__ ...
nrow(Fit_all_M3_description)
## [1] 370

This gives 370 CRISPRi target genes that induced acetic acid TOLERANCE

EXTRACT THE ACETIC ACID SENSITIVE STRAINS

  • First identify strains that grew well in Basal condition but did not grow or less than three (out of six) replicates managed to grow under acetic acid stress. We will call these strains as SUPER SENSITIVE. P-value estimation for these strains were not possible or was not performed as n was ≤ 2.
super_sen_M3 <- Analysis_Final_3[which(!is.na(Analysis_Final_3$CTRL_GT_Mean_all)
                                       &(Analysis_Final_3$n_LPI<3)
                                       &(
                                         is.na(Analysis_Final_3$LPI_GT_Mean_all)
                                         |(Analysis_Final_3$LPI_GT_Mean_all> 0.165662)
                                       )
), ]
nrow(super_sen_M3)
## [1] 17

This gives 17 ACETIC ACID SUPER SENSITIVE strains

  • Next, extract the row index that satisfy the statistical (adjusted P-value ≤ 0.1) and effect size (mean LPI GT > 0.165662) criterion for acetic acid tolerant candidates
candidate_padj_0.1_SEN_M3 <- which((Analysis_Final_3$LPI_GT_Mean_all > 0.165662 & Analysis_Final_3$P.adjusted_M3<= 0.1))
length(candidate_padj_0.1_SEN_M3)
## [1] 481

This gives 481 ACETIC ACID SENSITIVE strains.

  • Extract the row data of acetic acid sensitive strains
Sen_M3_complete <- rbind(super_sen_M3, Analysis_Final_3[candidate_padj_0.1_SEN_M3, ])
Sen_M3_complete <- Sen_M3_complete[order(Sen_M3_complete$LPI_GT_Mean_all, decreasing = TRUE), ]
nrow(Sen_M3_complete)
## [1] 498

In TOTAL, 481+17 = 498 strains displayed acetic acid SENSITIVITY

EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID SENSITIVITY

  • Prepare a data.frame with descriptions of CRISPRi target genes that induced acetic acid sensitivity. This file also include how many gRNAs per target gene induced the acetic acid sensitivity.
Sen_all_M3 <- data.frame(sort(table(c(Analysis_Final_3$GENE[candidate_padj_0.1_SEN_M3], super_sen_M3$GENE)), decreasing = TRUE))
y <- as.character(Sen_all_M3$Var1)
x <- whole_Gene_list_Final[y, ]
Sen_all_M3_description <- cbind(Sen_all_M3, x[, -1])
nrow(Sen_all_M3_description)
## [1] 367

This gives 367 CRISPRi target genes that induced acetic acid SENSITIVITY

GO ANALYSIS

Data preparation

Extracting the SGD_ID for the unique genes in Fit_all_M3 (see EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID TOLERANCE)

Fit_unique_M3 <- as.character(Fit_all_M3$Var1)
x <- whole_Gene_list_Final[Fit_unique_M3, ]
Fit_unique_M3_SGD_ID <- x$SGD_DB_ID
str(Fit_unique_M3_SGD_ID)
##  chr [1:370] "S000002731" "S000002835" "S000001899" "S000006069" ...

Extracting the SGD_ID for the unique genes in Sen_all_M3 (see EXTRACT ALL CRISPRi TARGET GENES THAT INDUCED ACETIC ACID SENSITIVITY)

Sen_unique_M3 <- as.character(Sen_all_M3$Var1)
x <- whole_Gene_list_Final[Sen_unique_M3, ]
Sen_unique_M3_SGD_ID <- x$SGD_DB_ID
str(Sen_unique_M3_SGD_ID)
##  chr [1:367] "S000003191" "S000003105" "S000006015" "S000001283" ...

Perform GO analysis with the above gene identifier sets in Saccharomyces genome database link

DATA VISUALIZATION

Here we present the SCAN-O-MATIC data in graph and charts.

PREREQUISITE PACKAGES

INSTALL

  • ggplot2
  • reshape
  • pheatmap
  • wordcloud

GROWTH CURVES

Plot some representative growth curves form scan-o-matic.

The growth curve data was generated by running the flatten_curves_2.py script (obtained from Simon Stenberg, Gothenburg University, Sweden and available on request) in the scan-o-matic analysis folder generated within the project folder. The program will then generate a curves_flat.csv file in that analysis folder. For the representative growth curve, we generate this curves_flat.csv for the project that have the growth output of plate number 7 and 8 at Basal and acetic acid condition in the screening Round 1. The file is then renamed as Data_for_Representative_GC_SOM.csv and available in our COMPILED_DATA folder.

  • Import data
Growth_curve_data <- read.csv("COMPILED_DATA/Data_for_Representative_GC_SOM.csv", sep = "\t", header = TRUE)
str(Growth_curve_data)
## 'data.frame':    255 obs. of  6145 variables:
##  $ X       : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ X0_0_0  : num  21659 21516 21142 20760 20348 ...
##  $ X0_0_1  : num  181676 183347 189180 197935 209654 ...
##  $ X0_0_2  : num  152110 153658 158302 165449 175688 ...
##  $ X0_0_3  : num  153383 155001 160064 167504 177861 ...
##  $ X0_0_4  : num  150199 152044 157029 164412 174582 ...
##  $ X0_0_5  : num  157769 159504 165478 173945 185550 ...
##  $ X0_0_6  : num  157492 159423 165380 173741 185011 ...
##  $ X0_0_7  : num  163998 165903 172040 180626 192484 ...
##  $ X0_0_8  : num  160719 162364 167986 176187 187369 ...
##  $ X0_0_9  : num  158611 160418 165957 173936 184680 ...
##  $ X0_0_10 : num  152766 154485 160248 168237 179351 ...
##  $ X0_0_11 : num  144377 146154 151480 159292 169859 ...
##  $ X0_0_12 : num  155710 157550 163646 172042 183956 ...
##  $ X0_0_13 : num  162645 164493 170546 179213 190696 ...
##  $ X0_0_14 : num  147321 149000 154870 163220 174498 ...
##  $ X0_0_15 : num  157033 158879 164907 173405 185125 ...
##  $ X0_0_16 : num  150163 152295 158376 167082 178636 ...
##  $ X0_0_17 : num  149176 151136 157494 166112 177624 ...
##  $ X0_0_18 : num  143700 145653 150989 158896 169646 ...
##  $ X0_0_19 : num  148948 150521 156191 164286 175496 ...
##  $ X0_0_20 : num  145147 147187 152912 161073 171831 ...
##  $ X0_0_21 : num  142052 143518 149019 156608 167246 ...
##  $ X0_0_22 : num  146214 148212 153862 161936 172927 ...
##  $ X0_0_23 : num  140830 142385 148023 155925 166785 ...
##  $ X0_0_24 : num  134855 136393 141402 148690 158693 ...
##  $ X0_0_25 : num  149492 151342 156745 164620 175547 ...
##  $ X0_0_26 : num  141515 143191 148426 155987 166452 ...
##  $ X0_0_27 : num  143168 145143 150564 158322 168999 ...
##  $ X0_0_28 : num  137947 139703 144873 152631 163225 ...
##  $ X0_0_29 : num  130303 131934 136797 143906 153586 ...
##  $ X0_0_30 : num  140933 142587 147920 155537 165961 ...
##  $ X0_0_31 : num  132977 134740 139798 147036 156743 ...
##  $ X0_0_32 : num  152496 154112 159580 167378 178165 ...
##  $ X0_0_33 : num  149542 151175 156140 163476 173447 ...
##  $ X0_0_34 : num  136258 137622 142480 149640 159691 ...
##  $ X0_0_35 : num  142896 144491 149719 157530 167855 ...
##  $ X0_0_36 : num  146818 148602 153988 161864 172516 ...
##  $ X0_0_37 : num  139913 141419 146601 153831 163993 ...
##  $ X0_0_38 : num  143241 145000 150354 158056 168380 ...
##  $ X0_0_39 : num  143393 145107 150475 158123 168609 ...
##  $ X0_0_40 : num  135160 136890 141943 149198 158987 ...
##  $ X0_0_41 : num  134769 136296 141351 148585 158703 ...
##  $ X0_0_42 : num  144394 145768 150783 157971 167928 ...
##  $ X0_0_43 : num  138399 140104 145184 152572 162749 ...
##  $ X0_0_44 : num  124673 126352 131224 138081 147141 ...
##  $ X0_0_45 : num  114587 115846 120020 125885 134350 ...
##  $ X0_0_46 : num  110348 111676 116271 122616 131179 ...
##  $ X0_0_47 : num  117456 119363 124589 131727 140869 ...
##  $ X0_1_0  : num  22810 23128 23870 24799 25829 ...
##  $ X0_1_1  : num  145029 147034 153453 161576 173377 ...
##  $ X0_1_2  : num  126534 128386 133739 141434 152168 ...
##  $ X0_1_3  : num  139343 141248 147201 155430 167194 ...
##  $ X0_1_4  : num  139499 140724 145191 151964 161708 ...
##  $ X0_1_5  : num  144551 145739 150218 157163 166766 ...
##  $ X0_1_6  : num  143214 144462 149041 155715 165060 ...
##  $ X0_1_7  : num  151872 153319 158240 165366 175326 ...
##  $ X0_1_8  : num  141396 143111 148785 156579 167330 ...
##  $ X0_1_9  : num  160825 162958 168962 177289 188668 ...
##  $ X0_1_10 : num  133036 134553 139455 146616 156589 ...
##  $ X0_1_11 : num  152913 154811 160957 169438 180925 ...
##  $ X0_1_12 : num  143398 144777 150134 157720 168074 ...
##  $ X0_1_13 : num  155655 156998 161956 169314 179452 ...
##  $ X0_1_14 : num  132847 133928 138343 144997 154503 ...
##  $ X0_1_15 : num  137801 139640 145204 153077 163597 ...
##  $ X0_1_16 : num  133666 135220 140380 147735 157941 ...
##  $ X0_1_17 : num  145284 146605 152138 160181 170900 ...
##  $ X0_1_18 : num  133894 135037 139321 146012 155380 ...
##  $ X0_1_19 : num  142898 144185 149398 156778 167194 ...
##  $ X0_1_20 : num  134633 136386 141744 149450 159842 ...
##  $ X0_1_21 : num  141251 142372 147568 154874 165134 ...
##  $ X0_1_22 : num  136226 138091 143321 150859 161098 ...
##  $ X0_1_23 : num  141314 142956 148132 155448 165359 ...
##  $ X0_1_24 : num  129189 130531 135453 142561 152295 ...
##  $ X0_1_25 : num  140901 142546 147494 154677 164453 ...
##  $ X0_1_26 : num  134427 135692 140321 147041 156181 ...
##  $ X0_1_27 : num  138449 140122 145260 152646 162672 ...
##  $ X0_1_28 : num  132034 133511 138861 146125 156225 ...
##  $ X0_1_29 : num  141238 143038 147991 155146 164859 ...
##  $ X0_1_30 : num  132174 133310 138048 144650 153980 ...
##  $ X0_1_31 : num  140381 141863 145986 152715 162049 ...
##  $ X0_1_32 : num  126915 128482 133721 140874 150809 ...
##  $ X0_1_33 : num  139928 141649 146794 154421 164627 ...
##  $ X0_1_34 : num  129303 130638 135409 142312 151965 ...
##  $ X0_1_35 : num  136006 137780 143515 151541 162142 ...
##  $ X0_1_36 : num  124382 126026 131319 138754 148893 ...
##  $ X0_1_37 : num  135698 137395 143070 150813 161206 ...
##  $ X0_1_38 : num  120918 122818 128164 135497 145464 ...
##  $ X0_1_39 : num  135871 137852 143947 152148 163011 ...
##  $ X0_1_40 : num  124747 126548 131563 138408 147764 ...
##  $ X0_1_41 : num  132137 133926 139524 147177 157657 ...
##  $ X0_1_42 : num  137212 138874 144363 152079 162437 ...
##  $ X0_1_43 : num  146906 148537 153929 161533 172109 ...
##  $ X0_1_44 : num  129890 131522 136845 144248 154354 ...
##  $ X0_1_45 : num  135719 137406 142752 150099 160076 ...
##  $ X0_1_46 : num  115419 116985 121873 128508 137603 ...
##  $ X0_1_47 : num  122226 123630 127756 133947 142124 ...
##  $ X0_2_0  : num  201430 203145 208703 217131 228514 ...
##  $ X0_2_1  : num  136834 138043 142287 148472 157794 ...
##   [list output truncated]
  • Read the data and prepare : Each scan-o-matic scanner can accommodate 4 plates. In this case the plates are arranged as below,

Plate0: Plate7_Basal Plate1: Plate8_Basal Plate2: Plate7_AceticAcid Plate3: Plate8_AceticAcid

Each plate have 1536 colonies i.e. 384 strains x 3 replicates + 384 spatial control.

The FIRST COLUMN is just the Image number and 0 being the first image.

Now there are 1536 * 4 = 6144 more columns after the first column. i.e. each colony data is a column. The naming format is as below;

X[Plate_number][row_number][column_number]

All numbers are starting from zero. Therefore, plate_numbers will be ranging from 0 to 3. Each 1536 plate has 32 rows and 48 column. Therefore the row numbers will be ranging from 0 to 31 and column numbers from 0 to 47.

Now we extract the data of only 4 strains from the entire dataset. i.e. one strain that displayed acetic acid sensitivity, a strain with slight acetic acid tolerance, and finally one control strain. The selected strains and the respective positions are obtained from the raw dataset whole_data_CRISPRi_aa

Strain Characteristics Strain name Plate Number Location1536 Colname Basal Colname acetic
Acetic acid Tolerant “POL2-NRg-1” Plate7 U4 X0_20_3 X2_20_3
Acetic acid sensitive “RRP15-TRg-4” Plate7 E4 X0_4_3 X2_4_3
Control strain1 “CC23” Plate8 AE23 X1_30_22 X3_30_22

Therefore extract the above columns data and also the first column with the image number and save it in a new variable. Then change the column names to the [gRNA]_[condition] format

Growth_curve_data_selected <- Growth_curve_data[, c("X", "X0_20_3", "X2_20_3", "X0_4_3", "X2_4_3", "X1_30_22", "X3_30_22")]
colnames(Growth_curve_data_selected) <- c("Time", "POL2-NRg-1_Basal", "POL2-NRg-1_Acetic", "RRP15-TRg-4_Basal", "RRP15-TRg-4_Acetic", "CC23_Basal", "CC23_Acetic")
str(Growth_curve_data_selected)
## 'data.frame':    255 obs. of  7 variables:
##  $ Time              : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ POL2-NRg-1_Basal  : num  113652 114873 119049 124818 132739 ...
##  $ POL2-NRg-1_Acetic : num  85018 85018 85057 85317 85615 ...
##  $ RRP15-TRg-4_Basal : num  134222 135491 139949 146494 156017 ...
##  $ RRP15-TRg-4_Acetic: num  81686 81982 82500 83028 83565 ...
##  $ CC23_Basal        : num  125292 126829 131175 137350 145429 ...
##  $ CC23_Acetic       : num  96983 97229 97975 98813 99707 ...

Images are automatically taken 20 minutes apart. Therefore, image number*20/60 will give us the time point in hour. Therefore, We will convert the first column in time point.

Growth_curve_data_selected[, 1] <- Growth_curve_data_selected[, 1]*20/60

Convert the data.frame in long format and save in a new variable

library(reshape)
Growth_curve_data_selected_long <- reshape(data=Growth_curve_data_selected, idvar="Time",
                                     varying = colnames(Growth_curve_data_selected)[2:7],
                                     v.name=c("Population_size"),
                                     new.row.names = 1:30000,
                                     direction="long",
                                     timevar = "gRNA_condition",
                                     times = colnames(Growth_curve_data_selected)[2:7])
str(Growth_curve_data_selected_long)
## 'data.frame':    1530 obs. of  3 variables:
##  $ Time           : num  0 0.333 0.667 1 1.333 ...
##  $ gRNA_condition : chr  "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" "POL2-NRg-1_Basal" ...
##  $ Population_size: num  113652 114873 119049 124818 132739 ...
##  - attr(*, "reshapeLong")=List of 4
##   ..$ varying:List of 1
##   .. ..$ Population_size: chr [1:6] "POL2-NRg-1_Basal" "POL2-NRg-1_Acetic" "RRP15-TRg-4_Basal" "RRP15-TRg-4_Acetic" ...
##   .. ..- attr(*, "v.names")= chr "Population_size"
##   .. ..- attr(*, "times")= chr [1:6] "POL2-NRg-1_Basal" "POL2-NRg-1_Acetic" "RRP15-TRg-4_Basal" "RRP15-TRg-4_Acetic" ...
##   ..$ v.names: chr "Population_size"
##   ..$ idvar  : chr "Time"
##   ..$ timevar: chr "gRNA_condition"
  • Plot the graph
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Figure 9 (Part of Fig. 1 in Manuscript): Representative growth curves

Figure 9 (Part of Fig. 1 in Manuscript): Representative growth curves

SCATTER PLOT : CORRELATION BETWEEN LPI GT MEAN ROUND 1 and LPI GT MEAN ROUND 2

Scatterplot to display reproducibility of the two scan-o-matic screenings. The mean of the three LPI_GT replicates of each strain is plotted against X and Y axis for round1 and round2, respectively. The data of the CRISPRi control strains are indicated with green dots, acetic acid sensitive strains are indicated with red dots and acetic acid tolerant strains are indicated with blue dots. Data of all other strains are indicated with black dots.

Figure 10 (Fig. 2A in Manuscript): DATA REPRODUCIBILITY

Figure 10 (Fig. 2A in Manuscript): DATA REPRODUCIBILITY

summary(stats_LPI_GT_Mean_RND1vsRND2_M3)
## 
## Call:
## lm(formula = LPI_GT_RND2_MEAN ~ LPI_GT_RND1_MEAN, data = Analysis_Final_3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.71004 -0.04877 -0.00146  0.04439  1.05360 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.009341   0.001014  -9.215   <2e-16 ***
## LPI_GT_RND1_MEAN  0.274544   0.005532  49.631   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0886 on 8832 degrees of freedom
##   (244 observations deleted due to missingness)
## Multiple R-squared:  0.2181, Adjusted R-squared:  0.218 
## F-statistic:  2463 on 1 and 8832 DF,  p-value: < 2.2e-16
cor(Analysis_Final_3$LPI_GT_RND1_MEAN, 
    Analysis_Final_3$LPI_GT_RND2_MEAN,  
    method = "pearson", 
    use = "complete.obs")
## [1] 0.4669872

The linear regression fitting model (black dashed line) for the data of all strains together gave a co-efficient of determination i.e. R2 = 0.22 and Pearson correlation coefficient r = 0.47

summary(stats_LPI_GT_Mean_RND1vsRND2_M3_selected)
## 
## Call:
## lm(formula = LPI_GT_RND2_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] ~ 
##     LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)], 
##     data = Analysis_Final_3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40299 -0.05268 -0.00899  0.03756  0.55375 
## 
## Coefficients:
##                                                                           Estimate
## (Intercept)                                                               0.000213
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] 0.581230
##                                                                           Std. Error
## (Intercept)                                                                 0.003588
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)]   0.010154
##                                                                           t value
## (Intercept)                                                                 0.059
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)]  57.240
##                                                                           Pr(>|t|)
## (Intercept)                                                                  0.953
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)]   <2e-16
##                                                                              
## (Intercept)                                                                  
## LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)] ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09715 on 872 degrees of freedom
##   (85 observations deleted due to missingness)
## Multiple R-squared:  0.7898, Adjusted R-squared:  0.7896 
## F-statistic:  3276 on 1 and 872 DF,  p-value: < 2.2e-16
cor(Analysis_Final_3$LPI_GT_RND1_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)], 
    Analysis_Final_3$LPI_GT_RND2_MEAN[c(candidate_padj_0.1_SEN_M3, candidate_padj_0.1_FIT_M3)],  
    method = "pearson", 
    use = "complete.obs")
## [1] 0.8887079

The linear regression fitting model (red dashed line) of the acetic acid sensitive and tolerant strain’s data gave a R2 value of 0.79 and Pearson correlation coefficient r= 0.89 .

SCATTER PLOT LSC GT (IN BASAL CONDITION) VS LPI_GT

Scatterplot showing the relative generation time of each CRISPRi strains in basal condition in X-axis [Log Strain Co-efficient (LSC) of generation time (GT)] and relative generation time under acetic acid stress condition (150mM Acetic acid) compared to control condition in Y-axis (LPI_GT). Each point indicates the mean of all the replicates (n=6). For some acetic acid sensitive strains (198), the number of replicates are between 3-5 (n=3 for 135; n=4 for 16; n=5 for 47) as not all replicates managed to grow on the acetic acid stress condition. The data of the CRISPRi control strains are indicated with the green dots. Based on our statistical analysis, strains that have FDR adjusted P-values ≤ 0.1 and mean LPI_GT > 0.165 (maximum LPI_GT of CRISPRi control strains) are designated as acetic acid sensitive strains ( represented by red dots). Strains that have FDR adjusted P-values ≤ 0.1 and mean LPI_GT < -0.037 (minimum LPI_GT of CRISPRi control strains) are designated as acetic acid tolerant(blue dots). The LPI_GT threshold is indicated with a gray dashed line. Data of strains that falls outside the adjusted P-value and LPI_GT threshold, are indicated with black dots.

Figure 11 (Fig. 2C in Manuscript): Normalized generation time (LSC GT) of strains in Basal condition vs Relative generation time (LPI GT) of strains in acetic acid condition compared to basal condition

Figure 11 (Fig. 2C in Manuscript): Normalized generation time (LSC GT) of strains in Basal condition vs Relative generation time (LPI GT) of strains in acetic acid condition compared to basal condition

VIOLIN PLOT

Violin-plots display the spread and the distribution of the LPI GT data for all CRISPRi strains (ALL), and LPI_GT values of CRISPRi control strains

  • Preparing a dataset for violin plot of LPI GTAll_strains, LPI GTControl_strains
Violin_LPI_Mean_M3 <- data.frame()
R <- length(which((Analysis_Final_3$Control.gRNA==0)
                  &(!is.na(Analysis_Final_3$LPI_GT_Mean_all))
))
Violin_LPI_Mean_M3[1:R, 1] <- Analysis_Final_3$LPI_GT_Mean_all[which((Analysis_Final_3$Control.gRNA==0)
                                                                     &(!is.na(Analysis_Final_3$LPI_GT_Mean_all)))]
Violin_LPI_Mean_M3[1:R, 2] <- "ALL"
R2 <- length(which(Analysis_Final_3$Control.gRNA==1))
Violin_LPI_Mean_M3[(R+1):(R+R2), 1] <- Analysis_Final_3$LPI_GT_Mean_all[which(Analysis_Final_3$Control.gRNA==1)]
Violin_LPI_Mean_M3[(R+1):(R+R2), 2] <- "CONTROL"
colnames(Violin_LPI_Mean_M3)[1:2] <- c("Mean", "Label")
Figure 12 (Fig. 2C INSET, in Manuscript): Violin-plots display the spread and the distribution of the LPI GT data

Figure 12 (Fig. 2C INSET, in Manuscript): Violin-plots display the spread and the distribution of the LPI GT data

WORDCLOUD

We display gene names that are highly represented within the fit and the sensitive strains, i.e. CRISPRi targeting of these genes by multiple gRNA displayed the tolerant / sensitive phenotype. The CRISPRi repression of a gene vs the obtained phenotype relationship is more reliable for those highly represented genes.

  • WORD CLOUD for the acetic acid TOLERANT strains
## Loading required package: RColorBrewer
Figure 13: Wordcloud for CRISPRi gene targets of acetic acid tolerant strains

Figure 13: Wordcloud for CRISPRi gene targets of acetic acid tolerant strains

  • WORD CLOUD for the acetic acid SENSITIVE strains
Figure 14: Wordcloud for CRISPRi gene targets of acetic acid sensitive strains

Figure 14: Wordcloud for CRISPRi gene targets of acetic acid sensitive strains

HISTOGRAM

STRAINS/GENE AND gRNA DISTANCE FROM TSS

First assigning rownames as the gRNA names in the Analysis_Final_3 data.frame

row.names(Analysis_Final_3) <- Analysis_Final_3$gRNA_name

Next, for this graph we fetch some additional information from a .CSV file available as supplementary in smith et al., 2017. The file is also available in our COMPILED_DATA folder

Supplementary data from smith et al., 2017 : smith_YEPGdata.csv

Smith_Yepg_data <- read.csv("COMPILED_DATA/smith_YEPGdata.csv", na.strings = "")
str(Smith_Yepg_data)
## 'data.frame':    8939 obs. of  28 variables:
##  $ ORF                       : chr  "YBL074C" "YBL074C" "YBL074C" "YBL074C" ...
##  $ gRNA_targeting_seq        : chr  "CCAGCGATAAGGAGGATCTT" "TGTGTCCTTTCTTCATCTCT" "AAAAGGAAAAAGTAATTAGG" "GTGAAAAGGAAAAAGTAATT" ...
##  $ Midpoint_TSS_dist         : int  -36 -135 -187 -190 -33 -90 -139 -178 -11 -15 ...
##  $ Norm_atac_seq_read_density: num  0.56 0.93 0.62 0.62 0.54 0.55 0.53 0.41 0.28 0.28 ...
##  $ Multiple_ORFs_Targeted    : int  0 0 0 0 0 0 1 1 0 0 ...
##  $ nearby_genes              : chr  NA NA NA NA ...
##  $ gene_name                 : chr  "AAR2" "AAR2" "AAR2" "AAR2" ...
##  $ guide_id                  : chr  "AAR2-NRg-3" "AAR2-NRg-4" "AAR2-TRg-15" "AAR2-TRg-16" ...
##  $ oligo_seq                 : chr  "GGGAGCTGCGATTGGCAGCCAGCGATAAGGAGGATCTTGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGTGTGTCCTTTCTTCATCTCTGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGAAAAGGAAAAAGTAATTAGGGTTTTAGAGCTAGAAATAGCAAG" "GGGAGCTGCGATTGGCAGGTGAAAAGGAAAAAGTAATTGTTTTAGAGCTAGAAATAGCAAG" ...
##  $ YPEG_._ATC1               : num  16.8 963.5 193.5 153.9 78.3 ...
##  $ YPEG_._ATC2               : num  15 1010.4 208.2 136.5 46.2 ...
##  $ YPEG_._ATC3               : num  14.3 896.3 268.5 123.7 34.5 ...
##  $ YPEG4                     : num  55.9 815.5 447.3 276.6 133.9 ...
##  $ YPEG5                     : num  54 577 427 450 159 ...
##  $ YPEG6                     : num  60.3 722.8 311.5 312.5 91.6 ...
##  $ YPD_._ATC7                : num  28.3 687.2 349.5 258.9 111.6 ...
##  $ YPD_._ATC8                : num  29.1 560.2 376.3 199.9 88.7 ...
##  $ YPD_._ATC9                : num  54.4 816.7 338.7 227.5 124.8 ...
##  $ YPD10                     : num  11.5 639 604.1 405.4 121.7 ...
##  $ YPD11                     : num  46.7 688 528.5 380.2 141.8 ...
##  $ YPD12                     : num  57 809 479 213 153 ...
##  $ Pool                      : int  2 2 2 2 1 2 1 1 1 1 ...
##  $ Log2_YPEG                 : num  -1.884 0.441 -0.823 -1.327 -1.276 ...
##  $ Log2_YPD                  : num  -0.043 -0.0491 -0.598 -0.5412 -0.3587 ...
##  $ YPEG_filter_25            : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ YPD_filter_25             : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ ORF_Category              : chr  "Essential" "Essential" "Essential" "Essential" ...
##  $ RNA_structure.Kcal.Mol.   : num  -60.6 -56.8 -50.5 -55.3 -56.4 -52.7 -54.6 -57.7 -60.7 -54.1 ...

Out of several columns, the most useful for this study will be,

  • Column No: 3 i.e. Midpoint_TSS_dist
  • Column No: 4 i.e. Norm_atac_seq_read_density
  • Column No: 5 i.e. Multiple_ORFs_Targeted
  • Column No: 6 i.e. nearby_genes

Extract only this four column in the Analysis_Final_3 data.frame

for(i in 1:nrow(Analysis_Final_3)){
  x <- which(row.names(Analysis_Final_3)[i]==Smith_Yepg_data$guide_id)
  if(length(x)==0){
    Analysis_Final_3[i, 104:107] <- NA
  } else {
    Analysis_Final_3[i, 104:107] <- Smith_Yepg_data[x, 3:6]
  }
}
colnames(Analysis_Final_3)[104:107] <- colnames(Smith_Yepg_data)[3:6]
str((Analysis_Final_3)[104:107])
## 'data.frame':    9078 obs. of  4 variables:
##  $ Midpoint_TSS_dist         : int  -36 -135 -187 -190 -33 -90 -139 -178 -11 -15 ...
##  $ Norm_atac_seq_read_density: num  0.56 0.93 0.62 0.62 0.54 0.55 0.53 0.41 0.28 0.28 ...
##  $ Multiple_ORFs_Targeted    : int  0 0 0 0 0 0 1 1 0 0 ...
##  $ nearby_genes              : chr  NA NA NA NA ...

Estimate the gRNA frequency

gRNA_Freq <- data.frame(sort(table(Analysis_Final_3$GENE), decreasing = TRUE))

Plot the graphs

Figure 15: (Figure S7 in Manuscript) Histogram of number of strains per target gene in the CRISPRi library (TOP PANEL). Histogram of gRNA distance from Transcription starting site of the Genes (BOTTOM PANEL)

Figure 15: (Figure S7 in Manuscript) Histogram of number of strains per target gene in the CRISPRi library (TOP PANEL). Histogram of gRNA distance from Transcription starting site of the Genes (BOTTOM PANEL)

NORMALIZED GENERATION TIME (LSC GT) IN BASAL AND UNDER ACETIC ACID STRESS

First the LSC GT mean under acetic acid stress was recalculated for all the strains excluding the missing values

for(i in 1:nrow(Analysis_Final_3)){
  test1 <- t(Analysis_Final_3[i, 22:27])
  x1 <- sum(!is.na(test1[, 1]))
  AA_GT_Mean_temp <- mean(test1[which(!is.na(test1[, 1]))])
  Analysis_Final_3[i, 32] <- AA_GT_Mean_temp
}

Plot the histogram

Figure 16 (fig. 2B in manuscript): Histogram to display strains growth in Basal condition (TOP PANEL). Histogram to display strains growth at 150mM of acetic acid (BOTTOM PANEL)

Figure 16 (fig. 2B in manuscript): Histogram to display strains growth in Basal condition (TOP PANEL). Histogram to display strains growth at 150mM of acetic acid (BOTTOM PANEL)

BIOSCREEN LIQUID MICRO-CULTIVATION ANALYSIS

The results from Scan-o-matic phenomics were validated in liquid microcultivation growth experiment in bioscreen

ATC TITRATION DATA ANALYSIS

Some CRISPRi strains were selcetd for a liquid growth experiment in bioscreen to identify a ATc concentration that can induce similar growth inhibition in YNB liquid media (Basal condition) as we observed in our Quantitative spot test assay on YNB agar media with 7.5 ug/ml of ATc. Here we analyze that data set. These strains were selected based on the competitive growth assay of the CRISPRi library in liquid YPD medium with and without 250 ng/ml of ATc by (Smith et al., 2017).

DATA PREPARATION FOR ATC DOSAGE RESPONSE

  • Compiled Data Import: The ATc titration data is available in compiled form in the COMPILED_DATA folder

ATc titration data compiled : ATc_liq_titer_data.csv

Atc_liq_data <- read.csv("COMPILED_DATA/ATc_liq_titer_data.csv", na.strings = "NaN", header = TRUE)
str(Atc_liq_data)
## 'data.frame':    80 obs. of  9 variables:
##  $ Well_No          : int  9 19 29 39 49 59 69 79 89 99 ...
##  $ gRNA_name        : chr  "ACT1-NRg-5" "ACT1-NRg-5" "ACT1-NRg-5" "ACT1-NRg-5" ...
##  $ Atc_concentration: num  0 0.25 1 2 3 5 7.5 10 15 25 ...
##  $ Lag_R1           : num  5.95 5.81 5.89 2.87 3.95 ...
##  $ GT_R1            : num  1.96 2.03 2.16 3.9 5.06 ...
##  $ Yield_R1         : num  2.98 3.17 2.77 1.42 1.03 ...
##  $ Lag_R2           : num  5.53 5.69 5.49 3.62 3.97 ...
##  $ GT_R2            : num  2 2.02 2.17 4.59 4.88 ...
##  $ Yield_R2         : num  2.87 3.02 2.96 1.21 1.05 ...

Note that in liquid experiment, three phenotypes were estimated i.e. growth LAG phase, GENERATION TIME and growth biomass YIELD

  • Extract CRISPRi control strain (CC23) data
Atc_liq_cc23 <- Atc_liq_data[which(Atc_liq_data$gRNA_name=="Ctrl-CC23"), ]
  • Extract additional information such as the strain’s gRNA names and ATc concentrations used for titration
uniq_gRNA <- unique(Atc_liq_data$gRNA_name)
uniq_conc <- unique(Atc_liq_data$Atc_concentration)
  • Data transformation (log)
Atc_liq_data[, 10:15] <- log(Atc_liq_data[, 4:9])
  • Estimate Normalized growth (LSC) for LAG , GENERATION TIME and YIELD

To determine the normalized growth or Log Strain Co-efficient (LSC) values, we use the data of the control strain CC23. Substracting the log transformed growth phenotypes of CC23 from the log transformed phenotypes of the strains in the respective concentrations of ATc generates the log strain coefficients or LSC values at that condition for each phenotypes.

for(i in 1:length(uniq_gRNA)){
  for(j in 1:length(uniq_conc)){
    Atc_liq_data[which(Atc_liq_data$gRNA_name==uniq_gRNA[i]&
                         Atc_liq_data$Atc_concentration==uniq_conc[j]), 16:21] <- 
      Atc_liq_data[which(Atc_liq_data$gRNA_name==uniq_gRNA[i]&
                           Atc_liq_data$Atc_concentration==uniq_conc[j]), 10:15] - 
      Atc_liq_data[which(Atc_liq_data$gRNA_name=="Ctrl-CC23"&
                           Atc_liq_data$Atc_concentration==uniq_conc[j]), 10:15]
  }
}
  • Estimate the Mean and Standard deviation of the LSC values for each phenotype
for(i in 1:nrow(Atc_liq_data)){
  Atc_liq_data[i, 22] <- mean(as.numeric(Atc_liq_data[i, c(16, 19)][which(!is.na(Atc_liq_data[i, c(16, 19)]))]))
  Atc_liq_data[i, 23] <- sd(as.numeric(Atc_liq_data[i, c(16, 19)][which(!is.na(Atc_liq_data[i, c(16, 19)]))]))
  Atc_liq_data[i, 24] <- mean(as.numeric(Atc_liq_data[i, c(17, 20)][which(!is.na(Atc_liq_data[i, c(17, 20)]))]))
  Atc_liq_data[i, 25] <- sd(as.numeric(Atc_liq_data[i, c(17, 20)][which(!is.na(Atc_liq_data[i, c(17, 20)]))]))
  Atc_liq_data[i, 26] <- mean(as.numeric(Atc_liq_data[i, c(18, 21)][which(!is.na(Atc_liq_data[i, c(18, 21)]))]))
  Atc_liq_data[i, 27] <- sd(as.numeric(Atc_liq_data[i, c(18, 21)][which(!is.na(Atc_liq_data[i, c(18, 21)]))]))
}
  • Assign new column names
colnames(Atc_liq_data)[10:15] <- paste0("log_", colnames(Atc_liq_data)[4:9])
colnames(Atc_liq_data)[16:21] <- paste0("LSC_", colnames(Atc_liq_data)[4:9])
colnames(Atc_liq_data)[22:23] <- paste0(c("Mean_", "SD_"), "LSC_Lag")
colnames(Atc_liq_data)[24:25] <- paste0(c("Mean_", "SD_"), "LSC_GT")
colnames(Atc_liq_data)[26:27] <- paste0(c("Mean_", "SD_"), "LSC_Yield")

ATc DOSAGE RESPONSE VISUALIZATION BY SCATTER-PLOT

First making a subset to trim the dataset and include data for only the following gRNA_names ACT1-NRg-5, ACT1-NRg-8, SEC21-NRg-5, VPS1-TRg-1. These gRNA’s previously showed to induce strong CRISPRi mediated repression that ultimately caused lethality or very poor growth. These strains were also used for the ATc titration on YNB agar plates by Qualitative Spot-Test Assay. We also display the performance of another CRISPRi control strain Ctrl-CC11 just to display how it performed compared to other strains.

name_gRNA_atc_titer <- c("ACT1-NRg-5", "ACT1-NRg-8", "Ctrl-CC11", "SEC21-NRg-5", "VPS1-TRg-1")
test <- data.frame()
Atc_titer_subset <- data.frame()
for(i in 1:length(name_gRNA_atc_titer)){
  test <- Atc_liq_data[which(Atc_liq_data$gRNA_name==name_gRNA_atc_titer[i]), ]
  Atc_titer_subset <- rbind(Atc_titer_subset, test)
}
  • PLOT NORMALIZED LAG, GENERATION TIME AND YIELD
Figure 17 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Lag phase

Figure 17 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Lag phase

Figure 18 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Generation time

Figure 18 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Generation time

Figure 19 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Yield

Figure 19 (fig. S8B in manuscript): Scatter plot to display ATC dosage response on Yield

VALIDATION DATA ANALYSIS

In order to validate the acetic acid sensitivity or tolerance observed for the CRISPRi strains in the scan-o-matic screening, selected strains were grown in liquid YNB medium using the Bioscreen platform. The 48 most acetic acid sensitive and 50 most tolerant CRISPRi strains from the scan-o-matic analysis were selected for the validation. Moreover, all CRISPRi strains with gRNAs targeting any of the following 12 genes:RPT4, RPN9, PRE4, MRPL10, MRPL4, SEC27, MIA40, VPS45, PUP3, VMA3, SEC62, COG1, were included making a total of 176 strains that were grown together with 7 control strains in liquid medium. The strains were grown in liquid YNB medium (basal condition) and in liquid YNB medium supplemented with 125 or 150mM of acetic acid. For each strain, 3 independent replicates were included for each growth condition.

VALIDATION DATA IMPORT